0

This is similar to MultiIndex-based indexing in pandas.

Is there a better way to iterate over sub-series?

df = pd.DataFrame([[1,1,1], [1,2,1], [1,2,2],
                   [2,1,1], [2,2,1], [2,3,1], [2,3,2], [2,3,3]],
                  columns=['a', 'b', 'c'])
g = df.groupby(['a', 'b']).size()
for label in g.index.levels[0]:
    print(label)
    print(g[label])

This will give:

1
b
1    1
2    2
dtype: int64
2
b
1    1
2    1
3    3
dtype: int64

Something like this pseudo-code:

for label, series in g.get_sub_series(level = 0):
    print(label)
    print(series)
4
  • 1
    Maybe for label, series in g.groupby(level = 0)? Commented Feb 1, 2017 at 14:43
  • Should I use groupby on Series that is the result of another groupby + fast count? Won't this re-compute the groups? The DataFrame may be ~300MiB. Commented Feb 1, 2017 at 15:36
  • 300MB shouldn't be too large for pandas, and also may be you can try groupby('b') firstly and then for each sub group, groupby('a'). I am not sure what you are trying to do, this is what I can suggest. Commented Feb 1, 2017 at 15:48
  • @Psidom: Yes, 300 MiB is fine. My question was about doing the same work again (grouping) which would take some time given the volume of data. And yes, I can group by 'b' first. That's not the issue. The code in my original post does the job. I want to know specific how how to iterate over level 0 in the MultiIndexed Series without re-computing anything and without n hash look-ups (if it is possible). That way I learn something new about pandas :) Commented Feb 1, 2017 at 19:55

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.