How do you create a numpy array from a mixed type byte array where you know the offsets to each type with zero copy?

Question

I have mixed data stored in a byte array in the following format

[ np.int64(1) np.int64(2) np.int64(3) np.float32(1.0) np.float32(2.0) np.float32(3.0) np.float64(1) np.float64(2) np.float64(3) ]

I know the offset to each new type, but am not sure how to create a numpy array from this. Normally you could use structured arrays, but given the format of the data, I am not sure how to do this. Any help would be appreciated.

The array is 9 values (3 int64, 3 float32, 3 float64), don't have it in structured arrays as of yet. It is just a byte array coming from some other location. Think of the 3 as just a number, if I have 10 int64, i will have 10 float32, and 10 float64. This is similar to 3 concatenated arrays where each array is of a different type but all arrays are of the same size.

For Example:

input (byte array)

bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03?\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x18\x00\x00\x00\x00\x00\x00@\x1c\x00\x00\x00\x00\x00\x00@ \x00\x00\x00\x00\x00\x00')

output (numpy array)

I want to fit it into a numpy array with mixed types (no copy)

[1 2 3 1.0 2.0 3.0 1.0 2.0 3.0]

this was generated using

np.array([1,2,3,np.float32(1),np.float32(2),np.float32(3),np.float64(1),np.float64(2),np.float64(3)], dtype='object')

1, 2, 3, 1.0, 2.0, 3.0, 1, 2, 3 — are these example values, or do they somehow define the format of your data? — anatolyg
– anatolyg, Commented Jun 6, 2022 at 13:52
According to the documentation of structured arrays, the corresponding code looks straightforward — just use an example in the documentation and change the types to match your use-case. Is there something specific which went wrong? — anatolyg
– anatolyg, Commented Jun 6, 2022 at 13:56
These are just example values that would be made up in the byte array. For structured arrays to work wouldnt the values need to be in repeated tuples like [ (int64 float32 float64) (int64 float32 float64) (int64 float32 float64) ] I currently only have a byte array with [ int64 int64 int64 float32 float32 float32 float64 float64 float64 ] I don't think i can just reshape as if i try to create this from a buffer, I have no way to specify the dtype with the given format — Ghastone
– Ghastone, Commented Jun 6, 2022 at 14:01
It looks like your array contains 3 records with 3 members each, and not just one record. Is this correct? — anatolyg
– anatolyg, Commented Jun 6, 2022 at 14:05
n = len(bytearr)//(8+4+8); np.ndarray(len(bytearr), np.byte, bytearr).view('>i8,'*n + '>f4,'*n + '>f8,'*n) creates a structured array (array([(1, 2, 3, 1., 2., 3., 6., 7., 8.)]) as a view of the same buffer as bytearr (Changes to one are reflected in the other). Appending .astype(object) gives the expected result but is necessarily a copy. — Michael Szczesny
– Michael Szczesny, Commented Jun 6, 2022 at 16:43

hpaulj · Accepted Answer · 2022-06-06 17:24:14Z

Constructing a byte array, that I think looks like what you have (but that's something of a guess):

In [347]: x,y,z = np.arange(3, dtype='int64'), np.arange(1,4,dtype='float32'),np.arange(2,5,dtype='int32')

In [349]: barr = x.tobytes()+y.tobytes()+z.tobytes()    
In [350]: barr
Out[350]: b'\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80?\x00\x00\x00@\x00\x00@@\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00'

Creating arrays that use barr and different offsets:

In [352]: x1 = np.ndarray(3,'int64',barr,0)
In [353]: x1
Out[353]: array([0, 1, 2], dtype=int64)

In [354]: y1 = np.ndarray(3,'float32',barr,offset=3*8)
In [355]: y1
Out[355]: array([1., 2., 3.], dtype=float32)

In [356]: z1 = np.ndarray(3,'int32',barr,offset=3*8+3*4)
In [357]: z1
Out[357]: array([2, 3, 4])

flat structured array

Defining a compound dtype - use string repeats as needed:

In [362]: dt = np.dtype('i8,i8,i8,f4,f4,f4,i4,i4,i4')

In [363]: dt
Out[363]: dtype([('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<f4'), ('f4', '<f4'), ('f5', '<f4'), ('f6', '<i4'), ('f7', '<i4'), ('f8', '<i4')])

In [364]: xyz = np.ndarray(1,dt,barr)

In [365]: xyz
Out[365]: 
array([(0, 1, 2, 1., 2., 3., 2, 3, 4)],
      dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<f4'), ('f4', '<f4'), ('f5', '<f4'), ('f6', '<i4'), ('f7', '<i4'), ('f8', '<i4')])

In [366]: xyz['f4']
Out[366]: array([2.], dtype=float32)

But it is hard to do 'math' across fields.

Using the commented dtype: n=3; dt1 = np.dtype('>i8,'*n + '>f4,'*n + '>i4,'*n)

better dtype

In [367]: dt = np.dtype([('x','i8',3),('y','f4',3),('z','i4',3)])

In [368]: dt
Out[368]: dtype([('x', '<i8', (3,)), ('y', '<f4', (3,)), ('z', '<i4', (3,))])

In [369]: xyz = np.ndarray(1,dt,barr)

In [370]: xyz
Out[370]: 
array([([0, 1, 2], [1., 2., 3.], [2, 3, 4])],
      dtype=[('x', '<i8', (3,)), ('y', '<f4', (3,)), ('z', '<i4', (3,))])

In [371]: xyz['y']
Out[371]: array([[1., 2., 3.]], dtype=float32)

With just 3 fields, this is much closer in character to my first solution.

Thanks for the reply! I am able to separate them out, however I was looking for a single array. Looks like Michael Szczesn comment is closer to what I'm looking for.
@Ghastone - It's hard to think of a use case where this solution isn't superior to a single array of type object.
@MichaelSzczesny I understand that, was just trying to figure out if this was possible, I was already able to have the multiple separate numpy array solution. If for any reason I needed a single numpy array, was trying to avoid copies

Collectives™ on Stack Overflow

How do you create a numpy array from a mixed type byte array where you know the offsets to each type with zero copy?

For Example:

input (byte array)

output (numpy array)

1 Answer 1

flat structured array

better dtype

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

For Example:

input (byte array)

output (numpy array)

1 Answer 1

flat structured array

better dtype

3 Comments

Related