I find myself parsing lots of data files (usually in a .csv file or similar) using the csv reader and a for loop to iterate over every line. The data is usually a table of floats so for example.
reader = csv.reader(open('somefile.csv'))
header = reader.next()
res_list = [list() for i in header]
for line in reader:
for i in range(len(line)):
res_list[i].append(float(line[i]))
result_dict = dict(zip(header,res_list)) #so we can refer by column title
This is a ok way to populate so I get each column as a separate list however, I would prefer that the default data container for lists of items (and nested lists) be numpy arrays, since 99 times out 100 the numbers get pumped into various processing scripts/functions and having the power of numpy lists makes my life easier.
The numpy append(arr, item) doesn't append in-place and therefore would require re-creating arrays for every point in the table (which is slow and unneccesary). I could also iterate over the list of data-columns and wrap them into an array after I'm done (which is what I've been doing), but sometimes it isn't so clear cut about when I'm done parsing the file and may need to append stuff to the list later down the line anyway.
I was wondering if there is some less-boiler-heavy way (to use the overused phrase "pythonic") to process tables of data in a similar way, or to populate arrays (where the underlying container is a list) dynamically and without copying arrays all the time.
(On another note: its kind of annoying that in general people use columns to organize data but csv reads in rows if the reader incorporated a read_column argument (yes, I know it wouldn't be super efficient), I think many people would avoid having boiler plate code like the above to parse a csv data file. )