Please understand, this post is currently just a series of notes to myself, and I feel and learn my way through neural network programming. It is not a stand alone tutorial. In the future I may write a tutorial series to explain in clear, and simple terms, how to create neural networks.
I am finding much of the information online just isn’t clear enough for total beginners so I am taking notes of my journey with the intention of, in the future, to produce a very clear series of tutorials to held total beginners begin on this exciting computer science path.
Standard data sets are a great way to learn about
This post is a series of experiments that I’ve done to learn about LSTM networks. The ultimate goal is to get a good prediction result with the MINST dataset using an LSTM created in Keras with a Tensorflow backend.
Note: for early experiments I am using a truncated version of the MINST database, only 100 records, to speed up processing. When I finally have a good model that works, I’ll try the whole data set on it.
Experiment 1
# James implementation of minst database
# import required to process csv files with pandas
import pandas as pd
# for array manipulation
import numpy as np
# for normalizing the data
from sklearn.preprocessing import MinMaxScaler
# allows onehotencoding
from sklearn.preprocessing import OneHotEncoder
# read the training csv file
mnist_training_data = pd.read_csv(r'C:\Users\james\Anaconda3JamesData\mnist_train_100.csv', header=None)
# each row of data consists of
# the first element [0]: is the label of the actual number
# the following 784 element [1:785]: is the actual number originally respresented as a 28 x28 pixel grid
# we want to create training data, and label data for our LSTM to train from
mnist_training_data_values = mnist_training_data.iloc[:, 1:785].values
mnist_training_data_labels = mnist_training_data.iloc[:, 0].values
# print("mnist_training_data_values shape:", mnist_training_data_values.shape)
# print("mnist_training_data_labels:", mnist_training_data_labels.shape)
# print (mnist_training_data_labels)
# we will normalize the training data & onehotencode the outputs
# normalise the training data
scaler = MinMaxScaler(feature_range = (0.0, 1.0))
mnist_training_data_values = scaler.fit_transform(mnist_training_data_values.astype('float64'))
# onehotencode the output values (the mnist_training_data_labels)
# Onehotencoding expects a 2D array and our label data is
mnist_training_data_labels = mnist_training_data_labels.reshape (100,1)
# print (mnist_training_data_labels)
encoder = OneHotEncoder(sparse=False, categories='auto')
# One hot Encode the data
mnist_training_data_labels = encoder.fit_transform(mnist_training_data_labels)
# print (mnist_training_data_labels)
# print (mnist_training_data_labels.shape)
# reshape mnist_training_data_values into 3d array
mnist_training_data_values = mnist_training_data_values.reshape(100, 784, 1)
print("mnist_training_data_values shape:", mnist_training_data_values.shape)
print ("mnist_training_data_labels shape: ",mnist_training_data_labels.shape)
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.optimizers import Adam
model = Sequential()
model.add(LSTM(300, return_sequences=True, input_shape=(784, 1)))
model.add(LSTM(300, return_sequences=True))
model.add(LSTM(100))
model.add(Dense(10, activation='softmax'))
# model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['acc'])
# history = model.fit(training_set_values , training_set_labels, validation_split=0.1, batch_size =20, epochs = 100, shuffle=True, verbose=2)
history = model.fit(mnist_training_data_values , mnist_training_data_labels , validation_split=0.1, shuffle=True, batch_size= 32,epochs=100, verbose=2)
import matplotlib.pyplot as plt
%matplotlib inline
# summarize history for loss
plt.figure(figsize=(13,8))
plt.plot(history.history['loss'], color='blue')
plt.plot(history.history['val_loss'], color='orange')
plt.title('model loss during training')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['loss', 'val_loss'], loc='upper left')
plt.show()

Conclusion: The network is just not learning any here. I can see why and has led me to understand the input to the LSTM network. I believe I can see why. I am feeding the LSTM network a 3D array of (100, 784, 1), that’s 100 samples of 784-time steps and 1 feature. I have broken down the 28×28 pixel image into what is effectively a line, although perhaps some good results could be obtained if the model itself was different.