Introduction¶

In this project, I'll train a Neural Network model to recognize handwritten digits using TensorFlow.

After the model is trained, I'll build a proof-of-concept application using JavaScript.

The Data set¶

The data set is the MNIST Database of Handwritten Digits. This data set is very famous and it is usually considered one of the first steps to learn about computer vision.

The data consists of individual black and white handwritten digits, with the size of 28x28 pixels. There are 55,000 digits for training, 5,000 for cross-validation and 10,000 digits for testing. All digits have a label with the true value. So this is a "Supervised machine learning problem".

To train the model I'll be using TensorFlow. I'll be using the MNIST data that comes from the TensorFlow package. For more information, you can check the documentation.

Importing the Data Analysis and TensorFlow packages¶

In [1]:

# Data Manipulation and Visualization
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# TensorFlow
import tensorflow as tf

# Get the MINST data. It is available from the tensorflow package
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz

Data Format¶

Each individual digit is an array with 784 (28x28) values. Each value represents the color for one pixel.

If we reshape it to 28x28 and plot it, we can see the original digit:

In [2]:

plt.imshow(mnist.train.images[5].reshape(28,28), cmap="Greys");

Now let's check the label for this digit.

In [3]:

mnist.train.labels[5]

Out[3]:

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.])

The label format is the one-hot encoding style. This means that the label corresponds to the index of the array where the value is 1.

Example:
[1,2,3,4,5,6,7,8,9]
[0,0,0,0,0,0,0,1,0]

Neural Network¶

After training different models, I've decided to implement a Neural Network with 1 hidden layer. This hidden layer will have 300 neurons.

In [4]:

# Network Parameters (784-300-10)
n_input = 784
hidden_layer_neurons = 300
n_classes = 10

# Training Parameters
learning_rate = 0.005
training_epochs = 30000
batch_size = 50

Creating TensorFlow Variables and Model¶

In [5]:

# x and y placeholders
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])

In [6]:

# Create weights and biases that will be used in the neural network
w1 = tf.Variable(tf.random_normal([n_input, hidden_layer_neurons]))
w2 = tf.Variable(tf.random_normal([hidden_layer_neurons, n_classes]))
b1 = tf.Variable(tf.random_normal([hidden_layer_neurons]))
b2 = tf.Variable(tf.random_normal([n_classes]))

In [7]:

# The multilayer perceptron model
hidden_layer = tf.nn.sigmoid(tf.add(tf.matmul(x, w1), b1))
output_layer = tf.add(tf.matmul(hidden_layer, w2), b2)

Cost function and Optimizer¶

The Cost is defined using the cross-entropy function and I'm using the Adam optimizer to minimize the cost.

In [8]:

# Cost funcition and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=output_layer, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

Model Evaluation and Accuracy¶

In [9]:

# Define the Test model and accuracy
correct_prediction = tf.equal(tf.argmax(output_layer, 1), tf.argmax(y, 1))
correct_prediction = tf.cast(correct_prediction, "float")
accuracy = tf.reduce_mean(correct_prediction)

Tensor Flow session¶

In [10]:

# Launch the session
sess = tf.InteractiveSession()

# Initialize variables
init = tf.global_variables_initializer()

# Start session
sess.run(init)

In [11]:

# Accuracies arrays to create a plot
train_accuracies = []
validation_accuracies = []
epoc_iteration = []

# Run the session, save the accuracies
for epoch in range(training_epochs):    
    batch_x, batch_y = mnist.train.next_batch(batch_size)
    if (epoch+1) < 100 or (epoch+1) % 100 == 0:
        train_ac = accuracy.eval({x: batch_x, y: batch_y})
        validation_ac = accuracy.eval({x: mnist.validation.images, 
                                       y: mnist.validation.labels})
        epoc_iteration.append(epoch+1)
        train_accuracies.append(train_ac)
        validation_accuracies.append(validation_ac)
    sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})

In [12]:

# Plot the training and validation accuracies
# Creates blank canvas
fig = plt.figure(figsize=(10,7))
axes1 = fig.add_axes([0.1, 0.1, 0.8, 0.8])
axes2 = fig.add_axes([0.36, 0.25, 0.53, 0.5])

# Plot full graph
axes1.plot(epoc_iteration, train_accuracies,'-b', label='Training')
axes1.plot(epoc_iteration, validation_accuracies,'-g', label='Validation')
axes1.legend()
axes1.set_xlabel('Epoch')
axes1.set_ylabel('Accuracy')
axes1.set_title('Training and Validation accuracy')

# Plot zoom in graph
plt.ylim(ymax = 1.001, ymin = 0.95)
axes2.plot(epoc_iteration[198:], train_accuracies[198:],'-b', label='Training')
axes2.plot(epoc_iteration[198:], validation_accuracies[198:],'-g', label='Validation')
axes2.set_title('Zoom in');

In [13]:

# Print final accuracies
print("Validation Accuracy:", accuracy.eval({x: mnist.validation.images, y: mnist.validation.labels}))
print("Test Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))

Validation Accuracy: 0.9732
Test Accuracy: 0.9718

The final accuracy of this model is 97%. The state-of-the-art model for this data set can give 99.7%.

In a production application, for example, in a postal code recognition where millions of digits are processed, this 2.7% difference would be significant. But in my case, a 97% accuracy is good enough because it was achieved using a simple model that is relatively easy to replicate in a proof-of-concept application. It is also using just one hidden layer so it won't require too many calculations.

So the final step is to save the weights and biases so that I can use it in my application.

In [14]:

# Save weight and bias as theta1 to be used in the proof-of-concept app
theta1 = np.concatenate((b1.eval().reshape(1,300),w1.eval()),axis=0)
np.savetxt("theta1.csv", theta1, delimiter=",")

In [15]:

# Save weight and bias as theta2 to be used in the proof-of-concept app
theta2 = np.concatenate((b2.eval().reshape(1,10),w2.eval()),axis=0)
np.savetxt("theta2.csv", theta2, delimiter=",")

Next: try the proof-of-concept application using the results of this model.

Training a Neural Network model to recognize handwritten digits