1

I have mixed data stored in a byte array in the following format

[ np.int64(1) np.int64(2) np.int64(3) np.float32(1.0) np.float32(2.0) np.float32(3.0) np.float64(1) np.float64(2) np.float64(3) ]

I know the offset to each new type, but am not sure how to create a numpy array from this. Normally you could use structured arrays, but given the format of the data, I am not sure how to do this. Any help would be appreciated.

The array is 9 values (3 int64, 3 float32, 3 float64), don't have it in structured arrays as of yet. It is just a byte array coming from some other location. Think of the 3 as just a number, if I have 10 int64, i will have 10 float32, and 10 float64. This is similar to 3 concatenated arrays where each array is of a different type but all arrays are of the same size.

For Example:

input (byte array)

bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03?\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x18\x00\x00\x00\x00\x00\x00@\x1c\x00\x00\x00\x00\x00\x00@ \x00\x00\x00\x00\x00\x00')

output (numpy array)

I want to fit it into a numpy array with mixed types (no copy)

[1 2 3 1.0 2.0 3.0 1.0 2.0 3.0]

this was generated using

np.array([1,2,3,np.float32(1),np.float32(2),np.float32(3),np.float64(1),np.float64(2),np.float64(3)], dtype='object')

10
  • 1, 2, 3, 1.0, 2.0, 3.0, 1, 2, 3 — are these example values, or do they somehow define the format of your data? Commented Jun 6, 2022 at 13:52
  • According to the documentation of structured arrays, the corresponding code looks straightforward — just use an example in the documentation and change the types to match your use-case. Is there something specific which went wrong? Commented Jun 6, 2022 at 13:56
  • These are just example values that would be made up in the byte array. For structured arrays to work wouldnt the values need to be in repeated tuples like [ (int64 float32 float64) (int64 float32 float64) (int64 float32 float64) ] I currently only have a byte array with [ int64 int64 int64 float32 float32 float32 float64 float64 float64 ] I don't think i can just reshape as if i try to create this from a buffer, I have no way to specify the dtype with the given format Commented Jun 6, 2022 at 14:01
  • It looks like your array contains 3 records with 3 members each, and not just one record. Is this correct? Commented Jun 6, 2022 at 14:05
  • 1
    n = len(bytearr)//(8+4+8); np.ndarray(len(bytearr), np.byte, bytearr).view('>i8,'*n + '>f4,'*n + '>f8,'*n) creates a structured array (array([(1, 2, 3, 1., 2., 3., 6., 7., 8.)]) as a view of the same buffer as bytearr (Changes to one are reflected in the other). Appending .astype(object) gives the expected result but is necessarily a copy. Commented Jun 6, 2022 at 16:43

1 Answer 1

3

Constructing a byte array, that I think looks like what you have (but that's something of a guess):

In [347]: x,y,z = np.arange(3, dtype='int64'), np.arange(1,4,dtype='float32'),np.arange(2,5,dtype='int32')

In [349]: barr = x.tobytes()+y.tobytes()+z.tobytes()    
In [350]: barr
Out[350]: b'\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80?\x00\x00\x00@\x00\x00@@\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00'

Creating arrays that use barr and different offsets:

In [352]: x1 = np.ndarray(3,'int64',barr,0)
In [353]: x1
Out[353]: array([0, 1, 2], dtype=int64)

In [354]: y1 = np.ndarray(3,'float32',barr,offset=3*8)
In [355]: y1
Out[355]: array([1., 2., 3.], dtype=float32)

In [356]: z1 = np.ndarray(3,'int32',barr,offset=3*8+3*4)
In [357]: z1
Out[357]: array([2, 3, 4])

flat structured array

Defining a compound dtype - use string repeats as needed:

In [362]: dt = np.dtype('i8,i8,i8,f4,f4,f4,i4,i4,i4')

In [363]: dt
Out[363]: dtype([('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<f4'), ('f4', '<f4'), ('f5', '<f4'), ('f6', '<i4'), ('f7', '<i4'), ('f8', '<i4')])

In [364]: xyz = np.ndarray(1,dt,barr)

In [365]: xyz
Out[365]: 
array([(0, 1, 2, 1., 2., 3., 2, 3, 4)],
      dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<f4'), ('f4', '<f4'), ('f5', '<f4'), ('f6', '<i4'), ('f7', '<i4'), ('f8', '<i4')])

In [366]: xyz['f4']
Out[366]: array([2.], dtype=float32)

But it is hard to do 'math' across fields.

Using the commented dtype: n=3; dt1 = np.dtype('>i8,'*n + '>f4,'*n + '>i4,'*n)

better dtype

In [367]: dt = np.dtype([('x','i8',3),('y','f4',3),('z','i4',3)])

In [368]: dt
Out[368]: dtype([('x', '<i8', (3,)), ('y', '<f4', (3,)), ('z', '<i4', (3,))])

In [369]: xyz = np.ndarray(1,dt,barr)

In [370]: xyz
Out[370]: 
array([([0, 1, 2], [1., 2., 3.], [2, 3, 4])],
      dtype=[('x', '<i8', (3,)), ('y', '<f4', (3,)), ('z', '<i4', (3,))])

In [371]: xyz['y']
Out[371]: array([[1., 2., 3.]], dtype=float32)

With just 3 fields, this is much closer in character to my first solution.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the reply! I am able to separate them out, however I was looking for a single array. Looks like Michael Szczesn comment is closer to what I'm looking for.
@Ghastone - It's hard to think of a use case where this solution isn't superior to a single array of type object.
@MichaelSzczesny I understand that, was just trying to figure out if this was possible, I was already able to have the multiple separate numpy array solution. If for any reason I needed a single numpy array, was trying to avoid copies

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.