With this example I’ve included a csv file that can be downloaded here.
If this file is opened in NotePad ++ it looks like the following.
We have 5 points in time that we want to train our LSTM on, to hopefully make predictions on the following unit of time. (Obviously we would need our training data to be much bigger than this but I’m just using a simple example to understand ‘shaping’ of data.
To put this data into context, it could be considered like this. Imagine we’re training a robot to walk, and the values represent various stepper motors on that robot.
What we are trying to do is predict the best motor settings for time 0.
i.e. the sequence is 5->4->3->2->1->??
We can see the columns currently represent the 7 motors on the robot, the first column is actually just representing the ‘step number’ and won’t be used in training the LSTM network.
The very first thing we want to do to our data is remove the first column.
To do this easily, we can use pandas iloc function.
To explain this, let’s do some code.
# import required to process csv files with pandas import pandas as pd # import numpy to create arrays import numpy as np # import the multi feature csv multifeature_csv = pd.read_csv(r'C:\Users\james\Anaconda3JamesData\AI_Multifeature_LSTM_series.csv', header=None) # diplay the contents of the csv file with NO processing myData_processed = multifeature_csv.iloc[:,].values print (myData_processed) # process the data, take all data except the first row myData_processed = multifeature_csv.iloc[:, 1:8].values # this is added simply to put a space # in-between the two print outputs for clarity print (" ") print (myData_processed)
So now we’ve managed to remove the value of each row (the first column). By using ‘myData_processed = multifeature_csv.iloc[:, 1:8].values’ in the example above.
An important point here: with pandas read csv, we must add ‘, header=None’ argument as above. This tells pandas that row 1 of the dataset is not a series of names for the columns (as is our case here). If we do not add this argument, pandas will not read the first row of data.
Let’s now take a look at the ‘shape’ of the data at the moment.
# myData_processed is referring to the data that has been # processed (the 1st column has been removed) as above/ print (myData_processed.shape) # x-axis print (myData_processed.shape) # y-axis print (myData_processed.shape)
We can see our data currently has;
– an x-axis of 5 which is the number of rows
– a y-axis of 7, which is the number of features in each row
Finally we see the shape is (x, y) = (5, 7)
The (5, 7) shows us we have a 2 dimensional array at the moment.
So how we create a 3 dimensional array?
By using the reshape() function we can change the dimentions of the array above with the following code.
# the data is 1 sample, 5 time steps, and 7 features data = myData_processed.reshape(1, 5, 7) print (data)
[[[124.709 124.926 124.598 124.693 11.64208106 20.94393565 18.99666547] [124.694 124.829 124.651 124.717 10.79624784 19.99622329 18.13706702] [124.707 124.854 124.62 124.658 9.98039535 19.43243205 17.84106688] [124.657 124.794 124.582 124.789 8.90443606 18.34251187 17.84574804] [124.79 124.888 124.371 124.516 8.58611711 18.32177354 20.80970792]]]
This has now created a
(1, 5, 7)