Numpy array: get the raw bytes without copying

Question

I am trying to concatenate the bytes of multiple Numpy arrays into a single bytearray to send it in an HTTP post request.

The most efficient way of doing this, that I can think of, is to create a sufficiently large bytearray object and then write into it the bytes from all the numpy arrays contiguously.

The code will look something like this:

list_arr = [np.array([1, 2, 3]), np.array([4, 5, 6])]
total_nb_bytes = sum(a.nbytes for a in list_arr)
cb = bytearray(total_nb_bytes)

# Too Lazy Didn't do: generate list of delimiters and information to decode the concatenated bytes array

# concatenate the bytes
for arr in list_arr:
    _bytes = arr.tobytes()
    cb.extend(_bytes)

The method tobytes() isn't a zero-copy method. It will copy the raw data of the numpy array into a bytes object.

In python, buffers allow access to inner raw data value (this is called protocol buffer at the C level) Python documentation; numpy had this possibility in numpy1.13, the method was called getbuffer() link. Yet, this method is deprecated!

What is the right way of doing this?

I do not know, but your question reminded me of python.org/dev/peps/pep-0574, which might be relevant. — psarka
– psarka, Commented Oct 12, 2021 at 17:10
This is a nicely researched question. Kudos on putting some work into it. I'm always happy to see questions that are based on lots of RTFM. — Mad Physicist
– Mad Physicist, Commented Oct 12, 2021 at 17:42
@psarka Thank you for your suggestion! Currently I use python3.6; I tried using pickle's 5th protocol and indeed it solves the performance issue! It divides the calculation time by half when compared with Protocol#4!! But I have to use python3.8 to use the new protocol. — Bashir Abdelwahed
– Bashir Abdelwahed, Commented Oct 13, 2021 at 15:45

nneonneo · Accepted Answer · 2021-10-12 17:14:56Z

3

Just use arr.data. This returns a memoryview object which references the array’s memory without copying. It can be indexed and sliced (creating new memoryviews without copying) and appended to a bytearray (copying just once into the bytearray).

answered Oct 12, 2021 at 17:14

nneonneo

181k37 gold badges331 silver badges412 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Bashir Abdelwahed Over a year ago

what a bad field name for the buffer! I was searching it for hours and I couldn't find it. Interestingly it is not well documented: numpy.org/doc/stable/reference/generated/…

Mad Physicist Over a year ago

You don't really need to do that. Wrap your bytearray in a numpy array and concatenate directly into it.

Mad Physicist · Accepted Answer · 2021-10-12 17:51:40Z

You can make a numpy-compatible buffer out of your message bytearray and write to that efficiently using np.concatenate's out argument.

list_arr = [np.array([1,2,3]), np.array([4,5,6])]
total_nb_bytes = sum(a.nbytes for a in list_arr)
total_size = sum(a.size for a in list_arr)
cb = bytearray(total_nb_bytes)

np.concatenate(list_arr, out=np.ndarray(total_size, dtype=list_arr[0].dtype, buffer=cb))

And sure enough,

>>> cb
bytearray(b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00\x06\x00\x00\x00')

This method has the implication that your output is all the same format. To fix that, view your original arrays as np.uint8:

np.concatenate([a.view(np.uint8) for a in list_arr],
               out=np.ndarray(total_nb_bytes, dtype=list_arr[0].dtype, buffer=cb))

This way, you don't need to compute total_size either, since you've already computed the number of bytes.

This approach is likely more efficient than looping through the list of arrays. You were right that the buffer protocol is your ticket to a solution. You can create an array object wrapped around the memory of any object supporting the buffer protocol using the low level np.ndarray constructor. From there, you can use all the usual numpy functions to interact with the buffer.

Collectives™ on Stack Overflow

Numpy array: get the raw bytes without copying

2 Answers 2

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Linked

Related