MINST experiments

Please understand, this post is currently just a series of notes to myself, and I feel and learn my way through neural network programming. It is not a stand alone tutorial. In the future I may write a tutorial series to explain in clear, and simple terms, how to create neural networks.

I am finding much of the information online just isn’t clear enough for total beginners so I am taking notes of my journey with the intention of, in the future, to produce a very clear series of tutorials to held total beginners begin on this exciting computer science path.

Standard data sets are a great way to learn about nerual network methodology. The reason being if you are running experiments on your own data, to look for patterns etc, you can’t really know if your programming or understanding of the Neural Network is sound, indeed you might just be putting garbage in and getting garbage out, without knowing you were putting garbage in in the first place.

This post is a series of experiments that I’ve done to learn about LSTM networks. The ultimate goal is to get a good prediction result with the MINST dataset using an LSTM created in Keras with a Tensorflow backend.

Note: for early experiments I am using a truncated version of the MINST database, only 100 records, to speed up processing. When I finally have a good model that works, I’ll try the whole data set on it.

Experiment 1

# James implementation of minst database
# import required to process csv files with pandas
import pandas as pd 
# for array manipulation
import numpy as np
# for normalizing the data
from sklearn.preprocessing import MinMaxScaler
# allows onehotencoding
from sklearn.preprocessing import OneHotEncoder

# read the training csv file
mnist_training_data = pd.read_csv(r'C:\Users\james\Anaconda3JamesData\mnist_train_100.csv', header=None)

# each row of data consists of
# the first element [0]: is the label of the actual number 
# the following 784 element [1:785]: is the actual number originally respresented as a 28 x28 pixel grid

# we want to create training data, and label data for our LSTM to train from
mnist_training_data_values = mnist_training_data.iloc[:, 1:785].values
mnist_training_data_labels = mnist_training_data.iloc[:, 0].values

# print("mnist_training_data_values shape:", mnist_training_data_values.shape)
# print("mnist_training_data_labels:", mnist_training_data_labels.shape)

# print (mnist_training_data_labels)

# we will normalize the training data & onehotencode the outputs
# normalise the training data
scaler = MinMaxScaler(feature_range = (0.0, 1.0))
mnist_training_data_values = scaler.fit_transform(mnist_training_data_values.astype('float64'))

# onehotencode the output values (the mnist_training_data_labels)
# Onehotencoding expects a 2D array and our label data is

mnist_training_data_labels = mnist_training_data_labels.reshape (100,1)
# print (mnist_training_data_labels)
encoder = OneHotEncoder(sparse=False, categories='auto')
# One hot Encode the data
mnist_training_data_labels = encoder.fit_transform(mnist_training_data_labels)
# print (mnist_training_data_labels)
# print (mnist_training_data_labels.shape)

# reshape mnist_training_data_values into 3d array
mnist_training_data_values = mnist_training_data_values.reshape(100, 784, 1)

print("mnist_training_data_values shape:", mnist_training_data_values.shape)
print ("mnist_training_data_labels shape: ",mnist_training_data_labels.shape)

from keras.models import Sequential 
from keras.layers import LSTM 
from keras.layers import Dense
from keras.optimizers import Adam

model = Sequential()
model.add(LSTM(300, return_sequences=True, input_shape=(784, 1)))
model.add(LSTM(300, return_sequences=True))
model.add(LSTM(100))
model.add(Dense(10, activation='softmax'))
# model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['acc'])
# history = model.fit(training_set_values , training_set_labels,  validation_split=0.1, batch_size =20, epochs = 100, shuffle=True, verbose=2)

history = model.fit(mnist_training_data_values , mnist_training_data_labels , validation_split=0.1, shuffle=True, batch_size= 32,epochs=100, verbose=2)

import matplotlib.pyplot as plt  
%matplotlib inline
# summarize history for loss
plt.figure(figsize=(13,8))  
plt.plot(history.history['loss'], color='blue')
plt.plot(history.history['val_loss'], color='orange')
plt.title('model loss during training')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['loss', 'val_loss'], loc='upper left')
plt.show()

Conclusion: The network is just not learning any here. I can see why and has led me to understand the input to the LSTM network. I believe I can see why. I am feeding the LSTM network a 3D array of (100, 784, 1), that’s 100 samples of 784-time steps and 1 feature. I have broken down the 28×28 pixel image into what is effectively a line, although perhaps some good results could be obtained if the model itself was different.

Leave a Reply