If I had data such as:
data = [[3, 2014], [4, 2014], [6, 2013], [6,2013]] etc...
What is the best way to calculate the sum by year in python?
I would use a dict if you need both the year and sum:
from collections import defaultdict
data = [[3, 2014], [4, 2014], [6, 2013], [6,2013]]
d = defaultdict(int)
for v, k in data:
d[k] += v
print(d)
Prints:
defaultdict(<type 'int'>, {2013: 12, 2014: 7})
data = [choice(random.data) for _ in range(10000)]random.choice(data)!As reported by DSM, using pandas and grouby it seems easy:
import pandas as pd
data = [[3, 2014], [4, 2014], [6, 2013], [6,2013]]
df = pd.DataFrame(data, columns=['value', 'year'])
df.groupby(['year']).sum()
which returns:
value
year
2013 12
2014 7
It nice because you can easy get more information like mean, median, std etc..
df.groupby(['year']).mean()
df.groupby(['year']).median()
df.groupby(['year']).std()
You can use counter() and +=.
import collections
data = [[3, 2014], [4, 2014], [6, 2013], [6,2013]]
c = collections.Counter()
for i, j in data:
c += collections.Counter({j: i})
print(c)
A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values.
You can add Counters, for example:
a = collections.Counter(a=1, b=2)
b = collections.Counter(a=3, c=3)
print(a+b)
prints Counter({'a': 4, 'c': 3, 'b': 2}).
dataa list of lists or a NumPy array?numpy? This is a simplegroupbyoperation, and while you can do that innumpyit would take less time to do it inpandasthan it took to write this sentence.