Converting (part of) a numpy recarray into a 2d array?

Question

We've got a set of recarrays of data for individual days - the first attribute is a timestamp and the rest are values.

Several of these:

    ts             a    b    c
2010-08-06 08:00, 1.2, 3.4, 5.6
2010-08-06 08:05, 1.2, 3.4, 5.6
2010-08-06 08:10, 1.2, 3.4, 5.6
2010-08-06 08:15, 2.2, 3.3, 5.6
2010-08-06 08:20, 1.2, 3.4, 5.6

We'd like to produce an array of the averages of each of the values (as if you laid all of the day data on top of each other, and averaged all of the values that line up). The timestamp times all match up, so we can do it by creating a result recarray with the timestamps, and the other columns all 0s, then doing something like:

for day in day_data:
    result.a += day.a
    result.b += day.b
    result.c += day.c

result.a /= len(day_data)
result.b /= len(day_data)
result.c /= len(day_data)

It seems like a better way would be to convert each day to a 2d array with just the numbers (lopping off the timestamps), then average them all element-wise in one operation, but we can't find a way to do this - it's always a 1d array of objects.

Does anyone know how to do this?

Joe Kington · Accepted Answer · 2013-07-30 02:01:33Z

8

There are several ways to do this. One way is to select multiple columns of the recarray and cast them as floats, then reshape back into a 2D array:

new_data = data[['a','b','c']].astype(np.float).reshape((data.size, 3))

Alternatively, you might consider something like this (negligibly slower, but more readable):

new_data = np.vstack([data[item] for item in ['a','b','c']]).T

Also note that it might be a good idea to look into pandas for operations such as these so that you can easily work with heterogeneous data.

edited Jul 30, 2013 at 2:01

answered Aug 11, 2010 at 15:46

Joe Kington

286k73 gold badges621 silver badges474 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

babbageclunk Over a year ago

That's great, thanks! I'm still struggling to get used to doing things on the arrays as a whole - my instinct is to do things to elements individually. One note from my testing - while the .view(np.float) part doesn't make a copy, the fancy slicing does.

Eric O. Lebigot Over a year ago

@Joe: If I'm not mistaken, @wilberforce is right about the copy: data[['a','b','c']].base is None, so this means that it owns its data and does not inherit it from data. This makes sense, as the fields are generally not contiguous. If you confirm this, it would be nice to update your answer. :)

Joe Kington Over a year ago

@EOL - You're absolutely right! (I don't know what I was thinking at the time...)

Joe Kington Over a year ago

@EOL - Also, indexing structured arrays with things like data[['a', 'b', 'c']] will return a view in future versions of numpy: github.com/numpy/numpy/pull/350/files As you mentioned, it doesn't at the moment, and hasn't in the past, though.

Collectives™ on Stack Overflow

Converting (part of) a numpy recarray into a 2d array?

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related