1

While migrating some old python 2 code to python 3, I ran into some problems populating structured numpy arrays from bytes objects.

I have a parser that defines a specific dtype for each type of data structure I might encounter. Since, in general, a given data structure may have variable-length or variable-type fields, these have been represented in the numpy array as fields of object dtype (np.object #alternatively np.dtype('O')).

The array is obtained from bytes (or a bytearray) by first populating the fixed-dtype fields. After this, the dtype of any sub-arrays (contained in 'object' fields) can be built using information from the fixed fields that precede it.

Here is a partial example of this process (dealing only with the fixed-dtype fields) that works in python 2. Note that we have a field named 'nSamples', which will presumably tell us the length of the array pointed to by the 'samples' field of the array, which would be interpreted as a numpy array with shape (2,) and dtype sampleDtype:

fancyDtype = np.dtype([('blah',     '<u4'),
                       ('bleh',      'S5'),
                       ('nSamples', '<u8'),
                       ('samples',    'O')])

sampleDtype = np.dtype([('sampleId', '<u2'),
                        ('val',      '<f4')])

bytesFromFile = bytearray(
    b'*\x00\x00\x00hello\x02\x00\x00\x00\x00\x00\x00\x00\xd0\xb5'
    b'\x14_\xa1\x7f\x00\x00"\x00\x00\x00\x80?]\x00\x00\x00\xa0@')

arr = np.zeros((1,), dtype=fancyDtype)
numBytesFixedPortion = 17

# Start out by just reading the fixed-type portion of the array
arr.data[:numBytesFixedPortion] = bytesFromFile[:numBytesFixedPortion]
memoryview(arr.data)[:numBytesFixedPortion] = bytesFromFile[:numBytesFixedPortion]

Both of the last two statements here that work in python 2.7.

Of note is that if I type

arr.data

I get <read-write buffer for 0x7f7a93bb7080, size 25, offset 0 at 0x7f7a9339cf70>, which tells me this is a buffer. Obviously, memoryview(arr.data) returns a memoryview object.

Both of these statements raise the following exception in python 3.6:

NotImplementedError: memoryview: unsupported format T{I:blah:5s:bleh:=Q:nSamples:O:samples:}

This tells me that numpy is returning a different type with its data attribute access, a memoryview rather than a buffer. It also tells me that memoryviews worked in python 2.7 but don't in python 3.6 for this purpose.

I found a similar issue in numpy's issue tracker: https://github.com/numpy/numpy/issues/13617 However, the issue was closed quickly, with the numpy developer indicating that it is a bug in ctypes. Since ctypes is a builtin, I kind of gave up hope on just updating it to get a fix.

I did finally stumble upon a solution that works, though it takes roughly twice as long as the python 2.7 method. It is:

import struct
struct.pack_into(
    'B' * numBytesFixedPortion,  # fmt
    arr.data,                    # buffer
    0,                           # offset
    *buf[:numBytesFixedPortion]  # unpacked byte values
)

A coworker also suggested attempting to use this solution:

arrView = arr.view('u1')
arrView[:numBytesFixedPortion] = buf[:numBytesFixedPortion]

However, on doing this, I get the exception:

File "/home/tintedFrantic/anaconda2/envs/py3/lib/python3.6/site-packages/numpy/core/_internal.py", line 461, in _view_is_safe
    raise TypeError("Cannot change data-type for object array.")
TypeError: Cannot change data-type for object array.

Note that I get this exception in both python 2.7 and 3.6. It appears numpy disallows views on arrays with any object fields. (Aside: I was able to get numpy to do this correctly by commenting out the check for object-type fields in the numpy code, though that seems a dangerous solution (and not a very portable one either)).

I've also tried creating separate arrays, one with the fixed-dtype fields and one with the object-dtype field and then using numpy.lib.recfunctions.merge_arrays to merge them. That fails with a cryptic message that I can't remember.

I am at a bit of a loss. I just want to write some arbitrary bytes to the numpy array's underlying memory and do it efficiently. This doesn't seem like it should be too hard to do, but I haven't come across a good way to do it. I would like a solution that isn't a hack either, as this is going into systems that need high reliability. If nothing better exists, I will use the struct.pack_into() solution, but I am hoping someone out there knows a better way. By the way, NOT using object-dtype fields is NOT a viable option, as the cost of doing so would be prohibitive.

If it matters, I am using numpy 1.16.2 in python 2.7 and 1.17.4 for python 3.6.

1
  • 1
    It seems like you might be able to do this with using a memoryview and a cast. Commented Mar 9, 2020 at 16:12

1 Answer 1

1

Per the suggestion of @nawsleahcimnoraa, I found out that in python 3.3+ (so not in python 2.7), the memoryview object, which is returned by arr.data in my python 3 environment, has a cast() method. Thus, I can do

arr.data.cast('B')[startIdx:endIdx] = buf[:numBytes]

This is much more like what I had in python 2.7. It is a lot more concise and also performs a little better than the struct method above.

One thing I noticed in testing these solutions is that, in general, the python 3 solutions were slower than the python 2 versions. For example, I tried the struct solution both using python 2 and python 3 and found a significant increase in processing time for python 3.

I also found fairly sizable discrepancies between different python environments of the same version. For example, I found that my system install of python 3.6 performed better than a virtual environment install of python 3.6, so it seems that the results will likely depend largely on a given environment's configuration.

Overall, I am happy with the results of using the cast() method of the memoryview object returned by arr.data and will use that for now. However, if someone discovers something that works better, I would still love to hear about it.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.