19

I have a binary file which contains records of position of a plane. Each record look like:

0x00: Time, float32
0x04: X, float32 // X axis position
0x08: Y, float32 // Y axis position
0x0C: Elevation, float32
0x10: float32*4 = Quaternion (x,y,z axis and w scalar)
0x20: Distance, float32 (unused)

So each record is 32 bytes long.

I would like to get a Numpy array.

At offset 1859 there is an unsigned int 32 (4 bytes) which indicates the number of elements of the array. 12019 in my case.

I don't care (for now) header data (before offset 1859)

Array only start at offset 1863 (=1859+4).

I defined my own Numpy dtype like

dtype = np.dtype([
    ("time", np.float32),
    ("PosX", np.float32),
    ("PosY", np.float32),
    ("Alt", np.float32),
    ("Qx", np.float32),
    ("Qy", np.float32),
    ("Qz", np.float32),
    ("Qw", np.float32),
    ("dist", np.float32),
])

And I'm reading file using fromfile:

a_bytes = np.fromfile(filename, dtype=dtype)

But I don't see any parameter to provide to fromfile to pass offset.

3 Answers 3

20

You can open the file with a standard python file open, then seek to skip the header, then pass in the file object to fromfile. Something like this:

import numpy as np
import os

dtype = np.dtype([
    ("time", np.float32),
    ("PosX", np.float32),
    ("PosY", np.float32),
    ("Alt", np.float32),
    ("Qx", np.float32),
    ("Qy", np.float32),
    ("Qz", np.float32),
    ("Qw", np.float32),
    ("dist", np.float32),
])

f = open("myfile", "rb")
f.seek(1863, os.SEEK_SET)

data = np.fromfile(f, dtype=dtype)
print x 
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. It solved my problem. I also noticed data = np.memmap(filename, dtype=dtype, mode='r', offset=offset_array, shape=N) `
right on, if its a large file then the memmap may be the way to go.
3

I faced a similar problem, but none of the answers above satisfied me. I needed to implement something like virtual table with a very big number of binary records that potentially occupied more memory than I can afford in one numpy array. So my question was how to read and write a small set of integers from/to a binary file - a subset of a file into a subset of numpy array.

This is a solution that worked for me:

import numpy as np
recordLen = 10 # number of int64's per record
recordSize = recordLen * 8 # size of a record in bytes
memArray = np.zeros(recordLen, dtype=np.int64) # a buffer for 1 record

# Create a binary file and open it for write+read
with open('BinaryFile.dat', 'w+b') as file:
    # Writing the array into the file as record recordNo:
    recordNo = 200 # the index of a target record in the file
    file.seek(recordSize * recordNo)
    bytes = memArray.tobytes()
    file.write(bytes)

    # Reading a record recordNo from file into the memArray
    file.seek(recordSize * recordNo)
    bytes = file.read(recordSize)
    memArray = np.frombuffer(bytes, dtype=np.int64).copy()
    # Note copy() added to make the memArray mutable

Comments

0

I suggest using numpy frombuffer:

with open(file_path, 'rb') as file_obj:
    file_obj.seek(seek_to_position)
    data_ro = np.frombuffer(file_obj.read(total_num_bytes), dtype=your_dtype_here)
    data_rw = data_ro.copy() #without copy(), the result is read-only

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.