I have a bunch of files (almost 100) which contain data of the format: (number of people) \t (average age)
These files were generated from a random walk conducted on a population of a certain demographic. Each file has 100,000 lines, corresponding to the average age of populations of sizes from 1 to 100,000. Each file corresponds to a different locality in a third world country. We will be comparing these values to the average ages of similar sized localities in a developed country.
What I want to do is,
for each i (i ranges from 1 to 100,000):
Read in the first 'i' values of average-age
perform some statistics on these values
That means, for each run i (where i ranges from 1 to 100,000), read in the first i values of average-age, add them to a list and run a few tests (like Kolmogorov-Smirnov or chi-square)
In order to open all these files in parallel, I figured the best way would be a dictionary of file objects. But I am stuck in trying to do the above operations.
Is my method the best possible one (complexity-wise)?
Is there a better method?
for i in range(100): read i lines from the file? If so, please update your algorithm.