python iterate through binary file without lines

Question

I've got some data in a binary file that I need to parse. The data is separated into chunks of 22 bytes, so I'm trying to generate a list of tuples, each tuple containing 22 values. The file isn't separated into lines though, so I'm having problems figuring out how to iterate through the file and grab the data.

If I do this it works just fine:

nextList = f.read(22)
newList = struct.unpack("BBBBBBBBBBBBBBBBBBBBBB", nextList)

where newList contains a tuple of 22 values. However, if I try to apply similar logic to a function that iterates through, it breaks down.

def getAllData():
    listOfAll = []
    nextList = f.read(22)
    while nextList != "":
        listOfAll.append(struct.unpack("BBBBBBBBBBBBBBBBBBBBBB", nextList))
        nextList = f.read(22)
    return listOfAll

data = getAllData()

gives me this error:

Traceback (most recent call last):
File "<pyshell#27>", line 1, in <module>
data = getAllData()
File "<pyshell#26>", line 5, in getAllData
listOfAll.append(struct.unpack("BBBBBBBBBBBBBBBBBBBBBB", nextList))
struct.error: unpack requires a bytes object of length 22

I'm fairly new to python so I'm not too sure where I'm going wrong here. I know for sure that the data in the file breaks down evenly into sections of 22 bytes, so it's not a problem there.

People know lots of things "for sure" that aren't so. :-) Why not add a print(len(nextList), repr(nextList)) before the append line, just in case? Another possibility is that it's the comparison which is failing: doesn't read return a bytes object? Are you sure that b"" == ""? — DSM
– DSM, Commented Dec 17, 2014 at 18:57
I did the first thing you suggested and while it did break down evenly as I thought, for some reason it was running again at the end when len(nextList) = 0, so I just added an if statement at the beginning of the while loop to break if len(nextList) != 22 and it worked exactly as intended. Thanks a ton, that was a silly mistake for me to make :) — lunadiviner
– lunadiviner, Commented Dec 17, 2014 at 19:10
I suggest using struct.unpack("22B", nextList) instead of typing out 22 Bs. — Eryk Sun
– Eryk Sun, Commented Dec 17, 2014 at 19:59
I didn't know this was a thing I can do (like I said, novice) but this does make it way easier, thanks! — lunadiviner
– lunadiviner, Commented Dec 17, 2014 at 20:51

DSM · Accepted Answer · 2014-12-17 19:22:16Z

4

Since you reported that it was running when len(nextList) == 0, this is probably because nextList (which isn't a list..) is an empty bytes object which isn't equal to an empty string object:

>>> b"" == ""
False

and so the condition in your line

while nextList != "":

is never true, even when nextList is empty. That's why using len(nextList) != 22 as a break condition worked, and even

while nextList:

should suffice.

answered Dec 17, 2014 at 19:22

DSM

355k67 gold badges605 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

lunadiviner Over a year ago

I changed it to while nextList: and it works perfectly without the if statement, thanks so much!

Dunes · Accepted Answer · 2014-12-17 19:50:46Z

read(22) isn't guaranteed to return a string of length 22. It's contract is to return string of length from anywhere between 0 and 22 (inclusive). A string of length zero indicates there is no more data to be read. In python 3 file objects produce bytes objects instead of str. str and bytes will never be considered equal.

If your file is small-ish then you'd be better off to read the entire file into memory and then split it up into chunks. eg.

listOfAll = []
data = f.read()
for i in range(0, len(data), 22):
   t = struct.unpack("BBBBBBBBBBBBBBBBBBBBBB", data[i:i+22])
   listOfAll.append(t)

Otherwise you will need to do something more complicated with checking the amount of data you get back from the read.

def dataiter(f, chunksize=22, buffersize=4096):
    data = b''
    while True:
        newdata = f.read(buffersize)    
        if not newdata: # end of file
            if not data:
                return
            else:
                yield data 
                # or raise error  as 0 < len(data) < chunksize
                # or pad with zeros to chunksize
                return

        data += newdata
        i = 0
        while len(data) - i >= chunksize:
            yield data[i:i+chunksize]
            i += chunksize

        try:
            data = data[i:] # keep remainder of unused data
        except IndexError:
            data = b'' # all data was used

Collectives™ on Stack Overflow

python iterate through binary file without lines

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related