22

What's the best way to convert numpy's recarray to a normal array?

i could do a .tolist() first and then do an array() again, but that seems somewhat inefficient..

Example:

import numpy as np
a = np.recarray((2,), dtype=[('x', int), ('y', float), ('z', int)])

>>> a
  rec.array([(30408891, 9.2944097561804909e-296, 30261980),
   (44512448, 4.5273310988985789e-300, 29979040)], 
  dtype=[('x', '<i4'), ('y', '<f8'), ('z', '<i4')])

>>> np.array(a.tolist())
   array([[  3.04088910e+007,   9.29440976e-296,   3.02619800e+007],
   [  4.45124480e+007,   4.52733110e-300,   2.99790400e+007]])
3
  • 1
    You aren't getting any answers because we don't understand your question. Try to reword your question, and include any relevant code. Commented Oct 20, 2011 at 21:14
  • 4
    To the down-voters I ask that you be a little more patient. This is a person who hasn't asked questions here before and hasn't had much time to revise the question. If the question stays in this poor form for too long, by all means down-vote it. Commented Oct 20, 2011 at 21:17
  • ok sorry guys, added an example. is this clearer? Commented Oct 20, 2011 at 21:25

2 Answers 2

18

By "normal array" I take it you mean a NumPy array of homogeneous dtype. Given a recarray, such as:

>>> a = np.array([(0, 1, 2),
              (3, 4, 5)],[('x', int), ('y', float), ('z', int)]).view(np.recarray)
rec.array([(0, 1.0, 2), (3, 4.0, 5)], 
      dtype=[('x', '<i4'), ('y', '<f8'), ('z', '<i4')])

we must first make each column have the same dtype. We can then convert it to a "normal array" by viewing the data by the same dtype:

>>> a.astype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')]).view('<f8')
array([ 0.,  1.,  2.,  3.,  4.,  5.])

astype returns a new numpy array. So the above requires additional memory in an amount proportional to the size of a. Each row of a requires 4+8+4=16 bytes, while a.astype(...) requires 8*3=24 bytes. Calling view requires no new memory, since view just changes how the underlying data is interpreted.

a.tolist() returns a new Python list. Each Python number is an object which requires more bytes than its equivalent representation in a numpy array. So a.tolist() requires more memory than a.astype(...).

Calling a.astype(...).view(...) is also faster than np.array(a.tolist()):

In [8]: a = np.array(zip(*[iter(xrange(300))]*3),[('x', int), ('y', float), ('z', int)]).view(np.recarray)

In [9]: %timeit a.astype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')]).view('<f8')
10000 loops, best of 3: 165 us per loop

In [10]: %timeit np.array(a.tolist())
1000 loops, best of 3: 683 us per loop
Sign up to request clarification or add additional context in comments.

1 Comment

You may need to ensure that the array is contiguous: np.ascontiguousarray(a, [('x', '<f8'), ('y', '<f8'), ('z', '<f8')]).view('<f8'), see stackoverflow.com/questions/29629157/…
3

Here is a relatively clean solution using pandas:

>>> import numpy as np
>>> import pandas as pd
>>> a = np.recarray((2,), dtype=[('x', int), ('y', float), ('z', int)])
>>> arr = pd.DataFrame(a).to_numpy()
>>> arr
array([[9.38925058e+013, 0.00000000e+000, 1.40380704e+014],
       [1.40380704e+014, 6.93572751e-310, 1.40380484e+014]])
>>> arr.shape
(2, 3)
>>> arr.dtype
dtype('float64')

First the data from the recarray are loaded into a pd.DataFrame, then the data are exported using the DataFrame.to_numpy method. As we can see, this method call has automatically converted all of the data to type float64.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.