How to read and process multiple files simultaneously in python

Question

I have multiple files and I want to read them simultaneously, extract a number from each row and do the averages. For a small number of files I did this using izip in the itertools module. Here is my code.

from itertools import izip
import math

g=open("MSDpara_ave_nvt.dat",'w')

with open("sample1/err_msdCECfortran_nvt.dat",'r') as f1, \
     open("sample2/err_msdCECfortran_nvt.dat",'r') as f2, \
     open("sample3/err_msdCECfortran_nvt.dat",'r') as f3, \
     open("err_msdCECfortran_nvt.dat",'r') as f4:

     for x,y,z,bg in izip(f1,f2,f3,f4):
         args1=x.split()
         i1 = float(args1[0])
         msd1 = float(args1[1])


         args2=y.split()
         i2 = float(args2[0])
         msd2 = float(args2[1])


         args3=z.split()
         i3 = float(args3[0])
         msd3 = float(args3[1])

         args4=bg.split()
         i4 = float(args4[0])
         msd4 = float(args4[1])


         msdave = (msd1 + msd2 + msd3 + msd4)/4.0

         print>>g, "%e  %e" %(i1, msdave)

 f1.close()
 f2.close()
 f3.close()
 f4.close()
 g.close()

This code works OK. But if I want to handle 100 files simultaneously, the code becomes very lengthy if I do it in this way. Are there any other simpler ways of doing this? It seems that fileinput module can also handle multiple files, but I don't know if it can do it simultaneously.

Thanks.

You don't need to explicitly close files opened in a with statement. — Lev Levitsky
– Lev Levitsky, Commented Jun 8, 2014 at 17:32

otus · Accepted Answer · 2014-06-09 05:54:12Z

The with open pattern is good, but in this case it gets in your way. You can open a list of files, then use that list inside izip:

filenames = ["sample1/err_msdCECfortran_nvt.dat",...]
files = [open(i, "r") for i in filenames]
for rows in izip(*files):
    # rows is now a tuple containing one row from each file

In Python 3.3+ you can also use ExitStack in a with block:

filenames = ["sample1/err_msdCECfortran_nvt.dat",...]
with ExitStack() as stack:
    files = [stack.enter_context(open(i, "r")) for i in filenames]
    for rows in zip(*files):
        # rows is now a tuple containing one row from each file

In Python < 3.3, to use with with all its advantages (e.g. timely closing no matter how you exit the block), you would need to create your own context manager:

class FileListReader(object):

    def init(self, filenames):
        self.files = [open(i, "r") for i in filenames]

    def __enter__(self):
        for i in files:
            i.__enter__()
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        for i in files:
            i.__exit__(exc_type, exc_value, traceback)

Then you could do:

filenames = ["sample1/err_msdCECfortran_nvt.dat",...]
with FileListReader(filenames) as f:
    for rows in izip(*f.files):
        #...

In this case the last might be considered over-engineering, though.

Instead of creating a new one, the OP could upgrade to modern Python and use an ExitStack instead.
@DSM, thanks for the link. I didn't know about that one (I use 2.7). That's certainly less code when used only once. I'll integrate it to the answer.
Thanks a lot, @otus. That's very helpful. So if I do 'files = [open(i, "r") for i in filenames] for rows in izip(files):' as you said, how can I read lines from each tuple "rows"? Apparently I cannot use readline().
@otus, it seems that the tuple 'rows' is not a tuple of strings. If I print the content of tuple 'rows', I only got something like "<open file 'err_msdCECfortran_new.dat', mode 'r' at 0x7ff7182d16f0>". And if I further look at the dimension of tuple 'rows' using 'len(rows)', it shows the dimension of 'rows' is one. I'm a bit confused that why this tuple 'rows' does not contain a row of string in my data file as you've mentioned.
@user2226358, sorry, I forgot the star * inside zip. Answer updated. (It passes the list as multiple arguments instead of one, so that zip will indeed zip them.)

Collectives™ on Stack Overflow

How to read and process multiple files simultaneously in python

1 Answer 1

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Linked

Related