65

I have a line of code:

g = x.groupby('Color')

The colors are Red, Blue, Green, Yellow, Purple, Orange, and Black. How do I return this list? For similar attributes, I use x.Attribute and it works fine, but x.Color doesn't behave the same way.

3
  • So you want a list of unique values in Color? Commented Mar 4, 2015 at 1:36
  • 3
    You can get the unique values from your orig df, no need to group x['Color'].unique() Commented Mar 4, 2015 at 8:50
  • 1
    The x['Color'].unique ended up being exactly what I was looking for. Thank you. Commented Mar 5, 2015 at 2:34

6 Answers 6

127

There is much easier way of doing it:

g = x.groupby('Color')

g.groups.keys()

By doing groupby() pandas returns you a dict of grouped DFs. You can easily get the key list of this dict by python built in function keys().

Sign up to request clarification or add additional context in comments.

6 Comments

This is much more pandorable than other answers. :)
Please look at Erik Swan's answer below before you make a decision on which method to use. If consistent ordering of group names is an issue, go for Erik's way.
groupby() does not return a dict, but a DataFrameGroupBy object.
In Python3.x the above code will throw a TypeError and list(g.groups) would be preferred, see also the accepted answer in this question
@Adriaan I get no errors when running this on Python 3.10.1, maybe an update changed that?
|
40

If you do not care about the order of the groups, Yanqi Ma's answer will work fine:

g = x.groupby('Color')
g.groups.keys()
list(g.groups) # or this

However, note that g.groups is a dictionary, so in Python <3.7 the keys are inherently unordered! This is the case even if you use sort=True on the groupby method to sort the groups, which is true by default.

This actually bit me hard when it resulted in a different order on two platforms, especially since I was using list(g.groups), so it wasn't obvious at first that g.groups was a dict.

In my opinion, the best way to do this is to take advantage of the fact that the GroupBy object has an iterator, and use a list comprehension to return the groups in the order they exist in the GroupBy object:

g = x.groupby('Color')
groups = [name for name,unused_df in g]

It's a little less readable, but this will always return the groups in the correct order.

4 Comments

just wondering how could I know attributes of GroupBy object? because as a premise, i think name should be one of attributes. However, I could not find relevant information in pandas document.
All of the methods and attributes of the GroupBy object are documented in the Pandas documentation.
The above concerns hold for Python versions prior to 3.7. For newer Python versions, dictionary keys are (insertion) ordered. I expect that list(g.groups)==[name for name,_ in g] is True, regardless of whether sort=True or sort=False.
Although the Pandas documentation doesn't explicitly state that, I agree that is probably true. Good to know this type of mistake is harder to make in Python 3.7+.
8

Here's how to do it.

groups = list()
for g, data in x.groupby('Color'):
    print(g, data)
    groups.append(g)

The core idea here is this: if you iterate over a dataframe groupby iterator, you'll get back a two-tuple of (group name, filtered data frame), where filtered data frame contains only records corresponding to that group).

1 Comment

Alternatively, if you want to get the unique values present in each column, you can do numpy.unique(x[col_name].values)
6

It is my understanding that you have a Data Frame which contains multiples columns. One of the columns is "Color" which has different types of colors. You want to return a list of unique colors that exist.

colorGroups = df.groupby(['Color'])
for c in colorGroups.groups: 
    print c

The above code will give you all the colors that exist without repeating the colors names. Thus, you should get an output such as:

Red
Blue
Green
Yellow
Purple
Orange
Black

An alternative is the unique() function which returns an array of all unique values in a Series. Thus to get an array of all unique colors, you would do:

df['Color'].unique()

The output is an array, so for example print df['Color'].unique()[3] would give you Yellow.

Comments

5

I compared runtime for the solutions above (with my data):

In [443]: d = df3.groupby("IND")

In [444]: %timeit groups = [name for name,unused_df in d]
377 ms ± 27.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [445]: % timeit  list(d.groups)
1.08 µs ± 47.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [446]: % timeit d.groups.keys()
708 ns ± 7.18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [447]: % timeit df3['IND'].unique()
5.33 ms ± 128 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

it seems that the 'd.groups.keys()' is the best method.

3 Comments

Please post the entire used command and your results, if you want to write an answer that is actually contributing. Otherwise use the comment option.
It's not that simple, runtime will depend on the structure of your data. In my case - a df with few groups but many members per group - I found the exact opposite result: the list comprehension was fastest (22 ms), while df.groupby(..).groups.keys() was slower: 124ms.
Note: in my experiment, the first time I run d.groups.keys(), it is much slower (again 100-300 ms), but the second time it is 4ms. So your results may only depend on the order you do the timing in.
0

Hope this helps.. Happy Coding :)

df = pd.DataFrame(data=[['red','1','1.5'],['blue','20','2.5'],['red','15','4']],columns=(['color','column1','column2']))

list_req = list(df.groupby('color').groups.keys())
print(list_req)

enter image description here

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.