4

We've got a set of recarrays of data for individual days - the first attribute is a timestamp and the rest are values.

Several of these:

    ts             a    b    c
2010-08-06 08:00, 1.2, 3.4, 5.6
2010-08-06 08:05, 1.2, 3.4, 5.6
2010-08-06 08:10, 1.2, 3.4, 5.6
2010-08-06 08:15, 2.2, 3.3, 5.6
2010-08-06 08:20, 1.2, 3.4, 5.6

We'd like to produce an array of the averages of each of the values (as if you laid all of the day data on top of each other, and averaged all of the values that line up). The timestamp times all match up, so we can do it by creating a result recarray with the timestamps, and the other columns all 0s, then doing something like:

for day in day_data:
    result.a += day.a
    result.b += day.b
    result.c += day.c

result.a /= len(day_data)
result.b /= len(day_data)
result.c /= len(day_data)

It seems like a better way would be to convert each day to a 2d array with just the numbers (lopping off the timestamps), then average them all element-wise in one operation, but we can't find a way to do this - it's always a 1d array of objects.

Does anyone know how to do this?

1 Answer 1

8

There are several ways to do this. One way is to select multiple columns of the recarray and cast them as floats, then reshape back into a 2D array:

new_data = data[['a','b','c']].astype(np.float).reshape((data.size, 3))

Alternatively, you might consider something like this (negligibly slower, but more readable):

new_data = np.vstack([data[item] for item in ['a','b','c']]).T

Also note that it might be a good idea to look into pandas for operations such as these so that you can easily work with heterogeneous data.

Sign up to request clarification or add additional context in comments.

4 Comments

That's great, thanks! I'm still struggling to get used to doing things on the arrays as a whole - my instinct is to do things to elements individually. One note from my testing - while the .view(np.float) part doesn't make a copy, the fancy slicing does.
@Joe: If I'm not mistaken, @wilberforce is right about the copy: data[['a','b','c']].base is None, so this means that it owns its data and does not inherit it from data. This makes sense, as the fields are generally not contiguous. If you confirm this, it would be nice to update your answer. :)
@EOL - You're absolutely right! (I don't know what I was thinking at the time...)
@EOL - Also, indexing structured arrays with things like data[['a', 'b', 'c']] will return a view in future versions of numpy: github.com/numpy/numpy/pull/350/files As you mentioned, it doesn't at the moment, and hasn't in the past, though.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.