Returning normalised data from a csv file in Python

Given a csv file with the contents

1,10,20,30,40,50,60,70,80,90,100
2,210,220,230,240,250,260,270,280,290,300
3,310,330,340,350,360,370,380,390,400,410

I want to read all data except the first value of each row (effectively ignore the first column). Before output, I want to normalise all that data to a range inbetween 0 and 1.

import numpy

array = numpy.genfromtxt('Anaconda3JamesData/james_test_3.csv', delimiter=',')

# get minimum and maximum values
# read all the values of the rows : except the first value 1: 
maximum=array[:, 1:].max() 
# read all the values of the rows : except the first value 1: 
minimum=array[:, 1:].min()  

print (minimum)
print (maximum)

print (array[:,1:]) # display all the values of the rows except the first value of each row

x = (array[:,1:] - minimum)/(maximum - minimum)

print (x)

This returns the output

10.0 410.0 

[[ 10. 20. 30. 40. 50. 60. 70. 80. 90. 100.] [210. 220. 230. 240. 250. 260. 270. 280. 290. 300.] [310. 330. 340. 350. 360. 370. 380. 390. 400. 410.]]

[[0. 0.025 0.05 0.075 0.1 0.125 0.15 0.175 0.2 0.225] [0.5 0.525 0.55 0.575 0.6 0.625 0.65 0.675 0.7 0.725] [0.75 0.8 0.825 0.85 0.875 0.9 0.925 0.95 0.975 1. ]]

The code is normalising the values between 0 and 1.

To normalise the data between for example 0.001 and 1 we would use the code

x = 0.001 + ((array[:,1:] - min)/(max - min)) * 0.999

This is because

1-0.001=0.999

Therefore to normalise the data between 0.01 and 1, we would use

1-0.01=0.99, so

x = 0.01 + ((array[:,1:] - min)/(max - min)) * 0.99

Would give us normalised data in the range of 0.01 and 1.

Some, part or all of the information on this page has been learnt from StackOverFlow. Published under Creative Commons License

Leave a Reply