Does "for line in file" work with binary files in Python?

Question

One of the answers for this question says that the following is a good way to read a large binary file without reading the whole thing into memory first:

 with open(image_filename, 'rb') as content:
     for line in content:
         #do anything you want

I thought the whole point of specifying 'rb' is that the line endings are ignored, therefore how could for line in content work?

Is this the most "Pythonic" way to read a large binary file or is there a better way?

I just posted your question as a comment below the answer in that question. That seems better than asking a new question. — Barmar
– Barmar, Commented Jul 25, 2015 at 23:44
Well all the answers are helpful, I can't accept an answer for 4 more minutes though, my apologies if it should have been a comment. — Startec
– Startec, Commented Jul 25, 2015 at 23:50

Community · Accepted Answer · 2017-05-23 12:06:07Z

4

I would write a simple helper function to read in the chunks you want:

def read_in_chunks(infile, chunk_size=1024):
    while True:
        chunk = infile.read(chunk_size)
        if chunk:
            yield chunk
        else:
            # The chunk was empty, which means we're at the end
            # of the file
            return

The use as you would for line in file like so:

with open(fn. 'rb') as f:
    for chunk in read_in_chunks(f):
        # do you stuff on that chunk...

BTW: I asked THIS question 5 years ago and this is a variant of an answer at that time...

You can also do:

from collections import partial
with open(fn,'rb') as f:
    for chunk in iter(functools.partial(f.read, numBytes),''):

edited May 23, 2017 at 12:06

CommunityBot

11 silver badge

answered Jul 25, 2015 at 23:50

dawg

105k23 gold badges142 silver badges217 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Startec Over a year ago

I am reading that question now. I guess this is kind of duplicate of that (sorry I didn't see that). As a follow up, how do you determine the right chunk_size

dawg Over a year ago

What is the characteristic of each chunk? How will you process it? Is the file too big to read in one go? When you have for record in file: there is usually some record like relationship in each record to the whole file. You need to say more.

Startec Over a year ago

5 Years ago you were a "Python newbie"?

dawg Over a year ago

Indeed I was. Perl was my weapon before that and C before that.

Joran Beasley · Accepted Answer · 2015-07-25 23:47:00Z

4

for line in fh will split at new lines regardless of how you open the file

often with binary files you consume them in chunks

CHUNK_SIZE=1024
for chunk in iter(lambda:fh.read(CHUNK_SIZE),""):
    do_something(chunk)

answered Jul 25, 2015 at 23:47

Joran Beasley

114k13 gold badges167 silver badges187 bronze badges

Comments

Ry- · Accepted Answer · 2015-07-25 23:44:49Z

3

Binary mode means that the line endings aren’t converted and that bytes objects are read (in Python 3); the file will still be read by “line” when using for line in f. I’d use read to read in consistent chunks instead, though.

with open(image_filename, 'rb') as f:
    # iter(callable, sentinel) – yield f.read(4096) until b'' appears
    for chunk in iter(lambda: f.read(4096), b''):
        …

answered Jul 25, 2015 at 23:44

Ry-♦

226k56 gold badges496 silver badges504 bronze badges

3 Comments

Startec Over a year ago

Why the size of 4096?

Joran Beasley Over a year ago

cause you have to pick a size ... it doesnt matter which (great answer minitech)

candied_orange Over a year ago

Well which does matter. Just not by much. You must have that much memory free to use all at once. Otherwise why chunk? Slurp up the whole file. The problem with minitech or Joran trying to tell you how big it should be is that they don't know your system requirements, environment, or use case. When in doubt try it out. Multiples of 2 are popular because they're easy for system to manage.

Collectives™ on Stack Overflow

Does "for line in file" work with binary files in Python?

3 Answers 3

4 Comments

Comments

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

3 Comments

Linked

Related