Given a csv file with the values:
1,10,20,30,40,50,60,70,80,90,100
2,210,220,230,240,250,260,270,280,290,300
3,310,330,340,350,360,370,380,390,400,410
training_data_file = open ("Anaconda3JamesData/james_test_3.csv","r")
training_data_list = training_data_file.readlines()
training_data_file.close()
count=0
for record in training_data_list:
# split the record by , and create a string array of those values
all_values = record.split(',')
print (all_values)
count+=1
pass
print (count)
['1', '10', '20', '30', '40', '50', '60', '70', '80', '90', '100\n']
['2', '210', '220', '230', '240', '250', '260', '270', '280', '290', '300\n']
['3', '310', '330', '340', '350', '360', '370', '380', '390', '400', '410\n']
3
The code so far converts each row (record) of the csv file to a comma separated string array.
Let’s say we want to add some of the values together in the string array. The following code won’t work in achieving that.
training_data_file = open ("Anaconda3JamesData/james_test_3.csv","r")
training_data_list = training_data_file.readlines()
training_data_file.close()
count=0
for record in training_data_list:
# split the record by , and create a string array of those values
all_values = record.split(',')
print (all_values)
print (all_values[1]+all_values[2])
count+=1
pass
print (count)
['1', '10', '20', '30', '40', '50', '60', '70', '80', '90', '100\n']
1020
['2', '210', '220', '230', '240', '250', '260', '270', '280', '290', '300\n']
210220
['3', '310', '330', '340', '350', '360', '370', '380', '390', '400', '410\n']
310330 3
As can be seen from the output, instead of adding the 2nd and 3rd value of the string array together, it’s concatenating the values together. It’s doing this because they’re strings, not numerical values.
To convert a string to a floating point number we can use float.
training_data_file = open ("Anaconda3JamesData/james_test_3.csv","r")
training_data_list = training_data_file.readlines()
training_data_file.close()
count=0
for record in training_data_list:
# split the record by , and create a string array of those values
all_values = record.split(',')
print (all_values)
print (float(all_values[1])+float(all_values[2]))
count+=1
pass
print (count)
['1', '10', '20', '30', '40', '50', '60', '70', '80', '90', '100\n']
30.0
['2', '210', '220', '230', '240', '250', '260', '270', '280', '290', '300\n']
430.0
['3', '310', '330', '340', '350', '360', '370', '380', '390', '400', '410\n']
640.0 3
Now we can see instead of concatenating two strings together, the output is indeed summing the 2nd and 3rd values of each row together. That’s good.
Using numpy
We can use numpy to convert the whole record to an array of floating point numbers. Once we do this, we can then easily sum and two numbers in the array.
# import numpy
import numpy
training_data_file = open ("Anaconda3JamesData/james_test_3.csv","r")
training_data_list = training_data_file.readlines()
training_data_file.close()
count=0
for record in training_data_list:
# split the record by , and create a string array of those values
all_values = record.split(',')
print (all_values)
# convert all the values in the string array to floating
# point numbers with numpy
newFloatingPointArray = (numpy.asfarray(all_values))
# now we can reference the new array directly and perform
# the addition of value together
print (newFloatingPointArray[1]+newFloatingPointArray[2])
count+=1
pass
print (count)
['1', '10', '20', '30', '40', '50', '60', '70', '80', '90', '100\n']
30.0
['2', '210', '220', '230', '240', '250', '260', '270', '280', '290', '300\n']
430.0
['3', '310', '330', '340', '350', '360', '370', '380', '390', '400', '410\n']
640.0 3