As part of a bigger problem I am working on where I have to read in a set of .csv files and manipulate them, generate a new set of .csv files. Everything is smooth for EXCEPT one file: voltvalues.csv. The content of the file looks like this:
...
13986513,6,6/1/2014 12:00:00 AM,248.7
13986513,6,6/1/2014 12:00:05 AM,248.4
13986513,6,6/1/2014 12:00:10 AM,249
13986513,6,6/1/2014 12:00:15 AM,249.3
13986513,6,6/1/2014 12:00:20 AM,249.3
13986513,6,6/1/2014 12:00:25 AM,249.3
...
13986513,6,6/30/2014 11:55:00 PM,249.3
13986534,6,6/1/2014 12:00:00 AM,249
13986534,6,6/1/2014 12:00:05 AM,249
13986534,6,6/1/2014 12:00:10 AM,249.3
13986534,6,6/1/2014 12:00:15 AM,249.6
...
13986534,6,6/30/2014 11:55:00 PM,249.7
...
I am trying to spit out another .csv file: newvolt.csv that has the data in the following format:
timestamp,13986513,13986534,...
2014-06-01 12:00:00 PDT,248.7,249.3,...
2014-06-01 12:00:05 PDT,248.4,249,...
...
2014-06-30 23:55:00 PDT,249.3,249.7,...
Problem(s) with this file is the size OF voltvalues.csv: 6GB (aboud 1billion rows and 4 columns). so the way I am reading is by something like this:
#meters=[]
real_recorder = open("newvolt.csv",'w')
with open("voltvalues.csv",'rb') as voltfile:
voltread = csv.reader(voltfile)
next(voltread)#skip header
for line in voltread:
#convert the data of voltvalues.csv into the format I desire
#BEST WAY to do it?
real_recorder.writelines([...])
#meters.append(line[0])
#print len(meters)
#print len(set(meters))
I know python's datetime module has some methods to change one datetime format to other but in this case, it is very expensive in terms of memory. Any suggestions on the best way to make the whole conversion?
if i>0? If you want to skip the header just usenext(voltread )iandenumerate()... those are costing you some time.next(voltread )then forget the if