This started as an experiment to see whether an LSTM network, which is usually used to classify time series data, could be used to classify 2D data, for example, images of handwritten letters.
The MINST dataset consists of 60,000 images of handwritten digits (0 through 9) written on 28 x 28-pixel grid used for training, and 10,000 images used for testing. Therefore each number is represented by 784 pieces of information. In this test, I was interested to see that if I broke up each image into 28 steps with 28 features for the LSTM network, whether it would still be able to learn. Effectively breaking up a 784 pixel image into a 28 time-step ‘movie’ (each frame containing 28 pieces of information).
I was surprised that upon the first attempt, which is below, will a completely none optimised network (I guessed the network shape), I was able to achieve a 98.89% accuracy. I will do further experiments using this network and this data to discover and learn more about how LSTM’s work. Essentially playing with hyperparameters to see what happens.
# James implementation of minst database
# import required to process csv files with pandas
import pandas as pd
# for array manipulation
import numpy as np
# for normalizing the data
from sklearn.preprocessing import MinMaxScaler
# allows onehotencoding
from sklearn.preprocessing import OneHotEncoder
# read the training csv file # data from 60000 images originally represented as a 28 x 28 pixel grid
# originally obtained from http://yann.lecun.com/exdb/mnist/
mnist_training_data = pd.read_csv(r'C:\Users\james\Anaconda3JamesData\mnist_train.csv', header=None)
# each row of data consists of
# the first element [0] is the label of the actual number (0 through 9)
# the following 784 element [1:785] is the actual number originally respresented as a 28 x 28 pixel grid
# we want to create training data, and label data for our LSTM to train from
mnist_training_data_values = mnist_training_data.iloc[:, 1:785].values
mnist_training_data_labels = mnist_training_data.iloc[:, 0].values
# we will normalize the training data (from 0 to 1 using MinMaxScaler)
# normalise the training data
scaler = MinMaxScaler(feature_range = (0.0, 1.0))
mnist_training_data_values = scaler.fit_transform(mnist_training_data_values.astype('float64'))
# the output of the network values (the labels) are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
# Onehotencoding expects a 2D array
mnist_training_data_labels = mnist_training_data_labels.reshape (60000,1)
# create the encoder
encoder = OneHotEncoder(sparse=False, categories='auto')
# encode the training labels
mnist_training_data_labels = encoder.fit_transform(mnist_training_data_labels)
# reshape mnist_training_data_values into 3d array
mnist_training_data_values = mnist_training_data_values.reshape(60000 , 28, 28)
# effectively I will feed each image into the LSTM as 28 rows of data
# with 28 steps - so effectively preceptually changing the data into a
# 28 step time sequence (with 28 features per step)
#print("mnist_training_data_values shape:", mnist_training_data_values.shape)
#print ("mnist_training_data_labels shape: ",mnist_training_data_labels.shape)
#print (mnist_training_data_labels )
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.optimizers import Adam
#from keras.layers import Dropout
# each sequence (representing one number from the minst database) has 28 steps and 28 features
model = Sequential()
model.add(LSTM(50,return_sequences=True,input_shape=(28, 28)))
model.add(LSTM(50))
model.add(Dense(10, activation='softmax'))
model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['acc'])
history = model.fit(mnist_training_data_values , mnist_training_data_labels , validation_split=0.1, shuffle=True, batch_size= 28,epochs=500, verbose=2)
# display diagnostics of the training
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(13,8))
plt.plot(history.history['loss'], color='blue')
plt.plot(history.history['val_loss'], color='orange')
plt.plot(history.history['acc'], color='red')
plt.plot(history.history['val_acc'], color='green')
plt.title('model loss during training')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['loss', 'val_loss', 'acc',' val_acc'], loc='upper left')
plt.show()
# prediction model
mnist_testing_data = pd.read_csv(r'C:\Users\james\Anaconda3JamesData\mnist_test.csv', header=None)
# process the data as before
mnist_testing_data_values = mnist_testing_data.iloc[:, 1:785].values
mnist_testing_data_labels = mnist_testing_data.iloc[:, 0].values
# using the same normalisation as I used on the training data
# hence using 'transform' and not 'fit_transform'
mnist_testing_data_values = scaler.transform(mnist_testing_data_values.astype('float64'))
length_of_testing_data = len(mnist_testing_data_values)
# reshape the testing data values to the same shape as the training values
mnist_testing_data_values = mnist_testing_data_values.reshape(length_of_testing_data, 28, 28)
predict = model.predict(mnist_testing_data_values)
# decode the onehotencoded prediction results
one_hot_decoded_data = encoder.inverse_transform(predict)
#length = len(one_hot_decoded_data) #the decoded predict model
one_hot_decoded_data = np.array(one_hot_decoded_data)
mnist_testing_data_labels = np.array(mnist_testing_data_labels)
#print (one_hot_decoded_data.shape)
#print (mnist_testing_data_labels.shape)
# reshape the one_hot_decoded_data
one_hot_decoded_data = one_hot_decoded_data.reshape(1,-1)
mnist_testing_data_labels = mnist_testing_data_labels.reshape(1,10000)
#print (one_hot_decoded_data.shape)
#print (mnist_testing_data_labels.shape)
# accuracy analytics
correct= 0
for i in range(0,10000):
if (one_hot_decoded_data[0,i] == mnist_testing_data_labels[0,i]):
print ("sample number: ",i," test: ",mnist_testing_data_labels[0,i], " predicted: ",one_hot_decoded_data[0,i], " CORRECT")
correct+=1
else:
print ("sample number: ",i," test: ",mnist_testing_data_labels[0,i], " predicted: ",one_hot_decoded_data[0,i], " FALSE")
print ('Accuracy: %f' % ((correct/10000)*100))
98.89% accuracy
Thank you so much for providing such a code.
I want to implement for gujarati character recognition from the scanned image.
Please give me more suggestion.
Sorry to say but this code is taking too much time for execution is there any solution for that .
I don’t know of a solution to speed it up. This was my first attempt at playing with LSTM networks so I can’t help you unfortunately. Best wishes James