0

There are different blocks in a binary that I want to read using a single call of numpy.fromfile. Each block has the following format:

OES=[
('EKEY','i4',1), 
('FD1','f4',1),
('EX1','f4',1),
('EY1','f4',1),
('EXY1','f4',1),
('EA1','f4',1),
('EMJRP1','f4',1),
('EMNRP1','f4',1),
('EMAX1','f4',1),
('FD2','f4',1),
('EX2','f4',1),
('EY2','f4',1),
('EXY2','f4',1),
('EA2','f4',1),
('EMJRP2','f4',1),
('EMNRP2','f4',1),
('EMAX2','f4',1)]

Here is the format of the binary:

 Data I want (OES format repeating n times)
 ------------------------
 Useless Data
 ------------------------
 Data I want (OES format repeating m times)
 ------------------------
 etc..

I know the byte increment between the data i want and the useless data. I also know the size of each data block i want.

So far, i have accomplished my goal by seeking on the file object f and then calling:

nparr = np.fromfile(f,dtype=OES,count=size)

So I have a different nparr for each data block I want and concatenated all the numpy arrays into one new array.

My goal is to have a single array with all the blocks i want without concatenating (for memory purposes). That is, I want to call nparr = np.fromfile(f,dtype=OES) only once. Is there a way to accomplish this goal?

1 Answer 1

2

That is, I want to call nparr = np.fromfile(f,dtype=OES) only once. Is there a way to accomplish this goal?

No, not with a single call to fromfile().

But if you know the complete layout of the file in advance, you can preallocate the array, and then use fromfile and seek to read the OES blocks directly into the preallocated array. Suppose, for example, that you know the file positions of each OES block, and you know the number of records in each block. That is, you know:

file_positions = [position1, position2, ...]
numrecords = [n1, n2, ...]

Then you could do something like this (assuming f is the already opened file):

total = sum(numrecords)
nparr = np.empty(total, dtype=OES)
current_index = 0
for pos, n in zip(file_positions, numrecords):
    f.seek(pos)
    nparr[current_index:current_index+n] = np.fromfile(f, count=n, dtype=OES)
    current_index += n
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you very much! I was considering this as well. I am a bit fuzzy on how the memory is managed. This wouldn't result in duplicate memory from np.fromfile and nparr? Would the nparr subset be a view of np.fromfile? Or a copy? Based on my tests, it seems like item assignments like that seem to be making a copy. but I'm probably wrong. Thank you for the awesome suggestion.
For each block, fromfile(f, count=n, dtype=OES) will create an array with length n. Then that array will be copied into the appropriate range in nparr. The array created by fromfile is not assigned anywhere else, so its memory is available to be reused by python.
you meant unavailable? Would make perfect sense if thats what you meant, based on my reading on python's garbage collector.
I meant available, as in the garbage collector can take over and reuse that memory.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.