(Image only. Try it below.)
This is a proof-of-concept implementation of a Neural Network algorithm using JavaScript.
Draw multiple digits in the canvas and a Neural Network algorithm recognizes them.
Try it:
Not compatible with Internet Explorer.
Design
-
Interface: a JavaScript canvas that you can draw digits and another where the results are drawn.
-
Model: a Neural Network consisted of 3 layers (784-300-10). For details, see my post "Training a Neural Network model to recognize handwritten digits".
Cool things about it
-
It works!
-
You can draw several digits...
-
With different sizes...
-
And even one digit inside another one.
What to improve
-
When training the neural network, the accuracy of the model on the test data was better than on this implementation.This is because the JavaScript function that scales down each digit does not result in an image with smooth edges like in the training data. A function that improves the scale down results would improve accuracy.
-
To separate each individual digit, I'm using a series of
for loop
witharray comparison and concatenation
which is computationally expensive when there are many digits. In a real application, an alternative would be to use asliding windows method
or another machine learning algorithm to detect each individual digit (although it might be costly too).
JavaScript code
In this section, I'll show the code that I wrote to process the image and get a prediction with a Neural Network. Note: this is the JavaScript code for the proof-of-concept interface. It is not the code for training the Machine Learning model.
The main steps are:
- Get Image Data: get what's drawn in the canvas and transform it into an array of pixels values.
- Get Individual Digits: separate the full image array into individual digits arrays.
- Loop for each individual digit:
- Process the image: scale down, center the digit into a 28x28 pixel image and get the array of pixels
- Neural Network: run the Neural Network algorithm and draw the answer.
Here you can see the main functions of the code. For the full version, check it on my GitHub.
1. Get Image Data
-
The full canvas size is 400 x 400 pixels, which gives a total of 160,000 pixels.
-
Each pixel has 4 values that represent the RGBA (Red, Green, Blue and Alpha) channels.
-
I use a JavaScript function called getImageData that transform the image into an array of pixels values.
-
Then, I iterate through the array and get only the alpha value of each pixel.
/*********************
*** GET IMAGE DATA ***
**********************/
// Get ImageData to transform the image to array of pixels
var imageDataOriginal = context.getImageData(0, 0, canvas.width, canvas.height);
// The Image contains 160,000 (400 X 400) pixels.
// The ImageDataOriginal contains 160,000 * 4 rows.
// Each 4 rows is one pixel, row 0 is pixel 1 R channel (Red)
// row 1 is pixel 1 G channel (Green)
// row 2 is pixel 1 B channel (Blue)
// row 3 is pixel 1 A channel (Alpha, transparency)
// Iterate through the array and get only the Alpha channel, which is 255 for black and 0 for white
var imageArrayOriginal = [];
for (var i = 0; i<160000; i++){
imageArrayOriginal[i] = imageDataOriginal.data[(i*4)+3];
2. Get Individual Digits
So far we have an array with everything from the canvas and we need to separate the digits. I'm quite proud of this function because it worked pretty well. The drawbacks are 1) it get slow if there are many digits and 2) it only considers one single digit as one that has continuous strokes (no 'blank space' allowed between the pixels).
The steps are:
-
For each pixel, it checks if it has a stroke, if so, it checks if the next one also has. All these pixels that have a stroke will be pushed into an array.
-
Then it checks if this new array is adjacent to a previous one from a row up. If so, it merges both.
Example:
|
/****************************
*** GET INDIVIDUAL DIGITS ***
*****************************/
// Separate different digits by grouping adjacent pixels
var k = 1;
window['arrayN' + k] = [];
for (var row = 0; row<400; row++){
for (var column = 0; column<400; column++){
if (imageDataOriginal.data[(row*400 + column)*4+3] > 0){
// Get the adjacent pixels if stroke is continuous and assign it to an array
var nextColumn = 1;
window['arrayN' + k].push(row*400 + column);
while ( imageDataOriginal.data[(row*400 + column + nextColumn)*4+3] != 0){
window['arrayN' + k].push(row*400 + column + nextColumn);
nextColumn++;
}
// Check if this array is adjacent to another array in a previous row
// k is the number or continuous arrays that have been identified
var arrayAdjusted = [];
var arrayToMerge = [];
for (var l = 1; l <= k; l++){
for (var element = 0; element < window['arrayN' + k].length; element++){
arrayAdjusted[element] = window['arrayN' + k][element] - 400;
if (window['arrayN' + l].includes(arrayAdjusted[element])){
arrayToMerge.push(l);
arrayToMerge.push(k);
break;
}
}
}
// Remove duplicated from array; decrease count k
if (arrayToMerge.length > 1) {
arrayToMerge = Array.from(new Set(arrayToMerge));
k--;
};
// Merge adjacent arrays into the same digit; and clear that array
for (var f = 1; f < arrayToMerge.length ; f++){
window['arrayN' + arrayToMerge[0]] = window['arrayN' + arrayToMerge[0]].concat(window['arrayN' + arrayToMerge[f]]);
window['arrayN' + arrayToMerge[f]] = [];
}
// Increase count k, initiate arrayNk, move to the correct column
k++;
window['arrayN' + k] = [];
column = column + nextColumn-1;
}
}
};
// Check which arrays are valid digits (length > 0); those that are length 0 are temporary arrays
var validaArrays = [];
for (var i = 1; i < k; i++){
if (window['arrayN' + i].length > 0){
validaArrays.push(i);
}
}
//// Process Neural Network for each digit
for (var i = 0; i < validaArrays.length; i++){
processIndividualImage(window['arrayN' + validaArrays[i]]);
}
3. Process Image
Now that we have each individual digit, we need to pre-process the image. This follows Yann LeCun's procedure (the author of this data set).
The steps are:
-
Draw the individual digit into a hidden canvas.
-
Calculate the digit's dimension (width and height).
-
Scale the image to a maximum width or height of 18 pixels (keeping the original aspect ratio).
-
Center the image into a new hidden 28 x 28 pixel canvas.
-
Get the image data from this new scaled and centered image.
-
Normalize the values by dividing them by 255.
-
Now we have the processed image data for one single digit.
function processIndividualImage(arrayToProcess){
/*********************
*** PROCESS IMAGE ***
**********************/
// Use hidden canvas to put indiviual digit
var canvasIndImage = document.getElementById("canvasCont2");
var contextIndImg = canvasIndImage.getContext("2d");
contextIndImg.clearRect(0, 0, canvasIndImage.width, canvasIndImage.height);
// Insert array digit into the image data; get columns and rows; put image on canvas
var imageDataCopy = contextIndImg.getImageData(0,0,canvasIndImage.width,canvasIndImage.height);
var columnArray = [];
var rowArray = [];
for (var j = 0; j < arrayToProcess.length ; j++){
imageDataCopy.data[(arrayToProcess[j])*4+3] = 255;
columnArray.push(Math.floor(arrayToProcess[j]/400));
rowArray.push(arrayToProcess[j]%400);
}
contextIndImg.putImageData(imageDataCopy,0,0);
// Get the image min and max x and y; Calculate the width and height
var minX = Math.min.apply(null, rowArray);
var maxX = Math.max.apply(null, rowArray);
var minY = Math.min.apply(null, columnArray);
var maxY = Math.max.apply(null, columnArray);
var originalWidth = maxX - minX;
var originalHeight = maxY - minY;
// To normalize the image and make it similar to the training dataset:
// Scale the image to an 18 x 18 pixel and center it into a 28 x 28 canvas
// The largest between the width and height will be scaled to 18 pixel
// The other will be reduced by the same scale, to preserve original aspect ratio
var scaleRed;
if (originalHeight > originalWidth){
scaleRed = originalHeight/18;
}
else {
scaleRed = originalWidth/18;
}
// Calculate a new Width and Height and new X and Y start positions, to center the image in a 28 x 28 pixel
var newWidth = originalWidth/scaleRed;
var newHeight = originalHeight/scaleRed;
var newXstart = (28 - newWidth)/2;
var newYstart = (28 - newHeight)/2;
// Draw the scaled and centered image to a new canvas
var canvasHidden = document.createElement("canvas");
canvasHidden.width = 28;
canvasHidden.heigth = 28;
var contextHidden = canvasHidden.getContext("2d");
contextHidden.clearRect(0, 0, canvasHidden.width, canvasHidden.height);
contextHidden.drawImage(canvasIndImage, minX, minY, originalWidth, originalHeight, newXstart, newYstart, newWidth, newHeight);
// Get the Image Data from the new scaled, centered, 28 x 28 pixel image
// Again, get the Alpha Channel only, but this time also normalize it by dividing it to the maximum value of 255
var imageData2 = contextHidden.getImageData(0, 0, 28,28);
processedImage = [];
for (var i = 0; i<784; i++){
processedImage[i] = parseFloat((imageData2.data[(i*4)+3]/255).toFixed(10));
}
4. Neural Network
The final step is to run the image data through the Neural Network function.
First Layer:
-
Add a value of 1 to the image data that will be used to calculate the bias.
-
Multiply the image data to the weights (theta 1) that came from the model training.
-
Apply the sigmoid function as an activation function.
Second Layer:
-
Add a value of 1 to the result of the activation function from layer 1.
-
Multiply it to weights (theta 2).
-
Apply the sigmoid function.
Getting the predicted value:
-
Get the index of the maximum value of the resulting array. This index is the predicted value.
-
Finally, draw the predicted value in the results canvas.
/*********************
*** NEURAL NETWORK ***
**********************/
// Hidden layer 1
// Add value 1 to the beginning of the array, to calculate the bias
processedImage.unshift("1");
//// Multiply the image date values by weights theta1
z2 = matrixMult(processedImage, theta1);
//// Apply sigmoid as activation function
a2 = sigmoid(z2);
// Hidden layer 2
// Add bias
a2.unshift("1");
//// Multiply a2 to weights theta2
z3 = matrixMult(a2, theta2);
//// Activation with sigmoid
a3 = sigmoid(z3);
// Answer is the index of the max value of a3
var answer = a3.indexOf(Math.max.apply(null, a3));
// Draw answer
contextAnswer.clearRect(minX, minY, originalWidth, originalHeight);
contextAnswer.font = originalHeight + "pt Times New Roman";
contextAnswer.fillText(answer,minX,maxY);
Here are the sigmoid function and the vector/matrix multiplication function.
function sigmoid(z){
var g =[];
for (var i = 0; i < z.length; i++){
g[i] = 1 / (1 + Math.exp(-z[i]));
}
return g;
};
// Calculate vector/matrix multiplication
function matrixMult(matrixA,matrixB){
var result = [];
for (var column = 0; column < matrixB[0].length; column++){
var m = 0;
for (var row = 0; row < matrixA.length; row++){
m = m + matrixA[row] * matrixB[row][column];
}
result[column] = m;
}
return result;
}
Comments
comments powered by Disqus