Numpy Sum from a multidimensional array?

Question

If I had data such as:

data = [[3, 2014], [4, 2014], [6, 2013], [6,2013]] etc...

What is the best way to calculate the sum by year in python?

The data is a list of lists. But I would think the easiest way is to convert to NumPy array? — Ben
– Ben, Commented Mar 20, 2015 at 17:25
Are you wedded to using numpy? This is a simple groupby operation, and while you can do that in numpy it would take less time to do it in pandas than it took to write this sentence. — DSM
– DSM, Commented Mar 20, 2015 at 17:32
A numpy array would be easy IF the all the years had the same number of entries, and they occurred in a regular pattern. Then you could slice and reshape to produce an array with one year per row. But it things are irregular, a default dictionary or groupby approach is better. — hpaulj
– hpaulj, Commented Mar 20, 2015 at 17:44

user · Accepted Answer · 2019-06-28 09:27:09Z

3

I would use a dict if you need both the year and sum:

from collections import defaultdict

data = [[3, 2014], [4, 2014], [6, 2013], [6,2013]]
d = defaultdict(int)

for v, k in data:
    d[k] += v
print(d)

Prints:

defaultdict(<type 'int'>, {2013: 12, 2014: 7})

edited Jun 28, 2019 at 9:27

user

5,2869 gold badges53 silver badges81 bronze badges

answered Mar 20, 2015 at 17:28

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Ben Over a year ago

Thanks!! What about using NumPy would it be easier?

Padraic Cunningham Over a year ago

@Ben, not sure it would be easier, if you want to be able to access which year is associated with which sum then a dict seems pretty much exactly what you want. another option would be pandas but I really think a dict is what you want.

Julien Spronck Over a year ago

all solutions work fine but I just want to point out one thing. sometimes the simplest solution is the fastest. I'm sure speed is not an issue here but I did a quick timeit on my machine for this solution, @Marcus's and my own: for 100, 000 iterations, this one: 0.88s, Marcus: 6.9s, mine: 0.21s

Padraic Cunningham Over a year ago

@JulienSpronck, a defaultdict will be more efficient once you test it on something containing more than 4 elements. Try data = [choice(random.data) for _ in range(10000)]

Padraic Cunningham Over a year ago

@JulienSpronck, that should be random.choice(data)!

Julien Spronck · Accepted Answer · 2015-03-20 17:55:26Z

1

Not sure if I understand the question. Here might be a simple answer without added modules.

dic = {}

for dat, year in data:
    if year not in dic:
        dic[year] = dat
    else:
        dic[year] += dat

or if you prefer

dic = {}
for dat, year in data:
    dic[year] = dat if year not in dic else dic[year] + dat

edited Mar 20, 2015 at 17:55

answered Mar 20, 2015 at 17:32

Julien Spronck

15.5k5 gold badges50 silver badges57 bronze badges

Comments

head7 · Accepted Answer · 2015-03-21 23:52:59Z

As reported by DSM, using pandas and grouby it seems easy:

import pandas as pd
data = [[3, 2014], [4, 2014], [6, 2013], [6,2013]]
df = pd.DataFrame(data, columns=['value', 'year'])
df.groupby(['year']).sum()

which returns:

      value
year       
2013     12
2014      7

It nice because you can easy get more information like mean, median, std etc..

df.groupby(['year']).mean()
df.groupby(['year']).median() 
df.groupby(['year']).std()

user · Accepted Answer · 2019-06-28 09:30:45Z

1

There's a specific python standard library class for that, Counter:

from collections import Counter
from operator import add

counters = [Counter({row[1]:row[0]}) for row in data]
result = reduce(add, counters)

your result is a dict-behaving object:

{2013: 12, 2014: 7}

edited Jun 28, 2019 at 9:30

user

5,2869 gold badges53 silver badges81 bronze badges

answered Mar 20, 2015 at 17:30

Marcus Müller

36.9k4 gold badges59 silver badges105 bronze badges

Comments

user · Accepted Answer · 2015-03-20 18:41:37Z

You can use counter() and +=.

import collections
data = [[3, 2014], [4, 2014], [6, 2013], [6,2013]]

c = collections.Counter()

for i, j in data:
    c += collections.Counter({j: i})

print(c)

A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values.

You can add Counters, for example:

a = collections.Counter(a=1, b=2)
b = collections.Counter(a=3, c=3)    
print(a+b)

prints Counter({'a': 4, 'c': 3, 'b': 2}).

Collectives™ on Stack Overflow

Numpy Sum from a multidimensional array?

5 Answers 5

5 Comments

Comments

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

5 Comments

Comments

Comments

Comments

Comments

Related