Using numpy.fromfile to read scattered binary data

Question

There are different blocks in a binary that I want to read using a single call of numpy.fromfile. Each block has the following format:

OES=[
('EKEY','i4',1), 
('FD1','f4',1),
('EX1','f4',1),
('EY1','f4',1),
('EXY1','f4',1),
('EA1','f4',1),
('EMJRP1','f4',1),
('EMNRP1','f4',1),
('EMAX1','f4',1),
('FD2','f4',1),
('EX2','f4',1),
('EY2','f4',1),
('EXY2','f4',1),
('EA2','f4',1),
('EMJRP2','f4',1),
('EMNRP2','f4',1),
('EMAX2','f4',1)]

Here is the format of the binary:

 Data I want (OES format repeating n times)
 ------------------------
 Useless Data
 ------------------------
 Data I want (OES format repeating m times)
 ------------------------
 etc..

I know the byte increment between the data i want and the useless data. I also know the size of each data block i want.

So far, i have accomplished my goal by seeking on the file object f and then calling:

nparr = np.fromfile(f,dtype=OES,count=size)

So I have a different nparr for each data block I want and concatenated all the numpy arrays into one new array.

My goal is to have a single array with all the blocks i want without concatenating (for memory purposes). That is, I want to call nparr = np.fromfile(f,dtype=OES) only once. Is there a way to accomplish this goal?

Warren Weckesser · Accepted Answer · 2016-08-06 16:41:17Z

2

That is, I want to call nparr = np.fromfile(f,dtype=OES) only once. Is there a way to accomplish this goal?

No, not with a single call to fromfile().

But if you know the complete layout of the file in advance, you can preallocate the array, and then use fromfile and seek to read the OES blocks directly into the preallocated array. Suppose, for example, that you know the file positions of each OES block, and you know the number of records in each block. That is, you know:

file_positions = [position1, position2, ...]
numrecords = [n1, n2, ...]

Then you could do something like this (assuming f is the already opened file):

total = sum(numrecords)
nparr = np.empty(total, dtype=OES)
current_index = 0
for pos, n in zip(file_positions, numrecords):
    f.seek(pos)
    nparr[current_index:current_index+n] = np.fromfile(f, count=n, dtype=OES)
    current_index += n

answered Aug 6, 2016 at 16:41

Warren Weckesser

116k20 gold badges207 silver badges224 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

snowleopard Over a year ago

Thank you very much! I was considering this as well. I am a bit fuzzy on how the memory is managed. This wouldn't result in duplicate memory from np.fromfile and nparr? Would the nparr subset be a view of np.fromfile? Or a copy? Based on my tests, it seems like item assignments like that seem to be making a copy. but I'm probably wrong. Thank you for the awesome suggestion.

Warren Weckesser Over a year ago

For each block, fromfile(f, count=n, dtype=OES) will create an array with length n. Then that array will be copied into the appropriate range in nparr. The array created by fromfile is not assigned anywhere else, so its memory is available to be reused by python.

snowleopard Over a year ago

you meant unavailable? Would make perfect sense if thats what you meant, based on my reading on python's garbage collector.

Warren Weckesser Over a year ago

I meant available, as in the garbage collector can take over and reuse that memory.

Collectives™ on Stack Overflow

Using numpy.fromfile to read scattered binary data

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related