Please understand, this post is currently just a series of notes to myself, and I feel and learn my way through neural network programming. It is not a stand alone tutorial. In the future I may write a tutorial series to explain in clear, and simple terms, how to create neural networks.
I am finding much of the information online just isn’t clear enough for total beginners so I am taking notes of my journey with the intention of, in the future, to produce a very clear series of tutorials to held total beginners begin on this exciting computer science path.
Standard data sets are a great way to learn about
This post is a series of experiments that I’ve done to learn about LSTM networks. The ultimate goal is to get a good prediction result with the MINST dataset using an LSTM created in Keras with a Tensorflow backend.
Note: for early experiments I am using a truncated version of the MINST database, only 100 records, to speed up processing. When I finally have a good model that works, I’ll try the whole data set on it.
# James implementation of minst database # import required to process csv files with pandas import pandas as pd # for array manipulation import numpy as np # for normalizing the data from sklearn.preprocessing import MinMaxScaler # allows onehotencoding from sklearn.preprocessing import OneHotEncoder # read the training csv file mnist_training_data = pd.read_csv(r'C:\Users\james\Anaconda3JamesData\mnist_train_100.csv', header=None) # each row of data consists of # the first element : is the label of the actual number # the following 784 element [1:785]: is the actual number originally respresented as a 28 x28 pixel grid # we want to create training data, and label data for our LSTM to train from mnist_training_data_values = mnist_training_data.iloc[:, 1:785].values mnist_training_data_labels = mnist_training_data.iloc[:, 0].values # print("mnist_training_data_values shape:", mnist_training_data_values.shape) # print("mnist_training_data_labels:", mnist_training_data_labels.shape) # print (mnist_training_data_labels) # we will normalize the training data & onehotencode the outputs # normalise the training data scaler = MinMaxScaler(feature_range = (0.0, 1.0)) mnist_training_data_values = scaler.fit_transform(mnist_training_data_values.astype('float64')) # onehotencode the output values (the mnist_training_data_labels) # Onehotencoding expects a 2D array and our label data is mnist_training_data_labels = mnist_training_data_labels.reshape (100,1) # print (mnist_training_data_labels) encoder = OneHotEncoder(sparse=False, categories='auto') # One hot Encode the data mnist_training_data_labels = encoder.fit_transform(mnist_training_data_labels) # print (mnist_training_data_labels) # print (mnist_training_data_labels.shape) # reshape mnist_training_data_values into 3d array mnist_training_data_values = mnist_training_data_values.reshape(100, 784, 1) print("mnist_training_data_values shape:", mnist_training_data_values.shape) print ("mnist_training_data_labels shape: ",mnist_training_data_labels.shape) from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense from keras.optimizers import Adam model = Sequential() model.add(LSTM(300, return_sequences=True, input_shape=(784, 1))) model.add(LSTM(300, return_sequences=True)) model.add(LSTM(100)) model.add(Dense(10, activation='softmax')) # model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc']) model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['acc']) # history = model.fit(training_set_values , training_set_labels, validation_split=0.1, batch_size =20, epochs = 100, shuffle=True, verbose=2) history = model.fit(mnist_training_data_values , mnist_training_data_labels , validation_split=0.1, shuffle=True, batch_size= 32,epochs=100, verbose=2) import matplotlib.pyplot as plt %matplotlib inline # summarize history for loss plt.figure(figsize=(13,8)) plt.plot(history.history['loss'], color='blue') plt.plot(history.history['val_loss'], color='orange') plt.title('model loss during training') plt.ylabel('loss') plt.xlabel('epoch') plt.legend(['loss', 'val_loss'], loc='upper left') plt.show()
Conclusion: The network is just not learning any here. I can see why and has led me to understand the input to the LSTM network. I believe I can see why. I am feeding the LSTM network a 3D array of (100, 784, 1), that’s 100 samples of 784-time steps and 1 feature. I have broken down the 28×28 pixel image into what is effectively a line, although perhaps some good results could be obtained if the model itself was different.