Pandas dataframe get index and header when in multiple columns

Question

I have a dataframe which looks like this:

d = {'id': ['Mc','Web','G','M','F'], 'Person1':['x','x','x',None,None],'Person2':['x',None,'x','x',None], 'Person3':['x',None, None,None, None]}

df = pd.DataFrame(d)
df.set_index('id', inplace=True)

    Person1 Person2 Person3
id                         
Mc        x       x       x
Web       x    None    None
G         x       x    None
M      None       x    None
F      None    None    None

How can I get the id-value and column header if an id appears with more than one person?. For example, the above data frame should give the following dictionary:

{'Mc':[Person1, Person2, Person3], 'G':[Person1, Person2]}

Any help would be very much appreciated.

BENY · Accepted Answer · 2017-11-13 15:40:04Z

4

df[df.notnull().sum(1)>1].stack().reset_index().\
     groupby('id')['level_1'].apply(list).to_dict()
Out[382]: {'G': ['Person1', 'Person2'], 'Mc': ['Person1', 'Person2', 'Person3']}

answered Nov 13, 2017 at 15:40

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

dliv Over a year ago

Thank you for your help, Wen! Accepting because your answer works and you were the first to provide me with a solution.

BENY Over a year ago

@dliv glad it help , have a nice day :-)

jezrael · Accepted Answer · 2017-11-13 15:43:19Z

3

First filter and create dictionary and then get keys if values are not Nones:

d = df[df.count(1) > 1].to_dict(orient='index')
print (d)
{'G': {'Person1': 'x', 'Person3': None, 'Person2': 'x'}, 
'Mc': {'Person1': 'x', 'Person3': 'x', 'Person2': 'x'}}

d1 = {k:[k1 for k1, v1 in v.items() if pd.notnull(v1)] for k,v in d.items()}
print (d1)
{'G': ['Person1', 'Person2'], 'Mc': ['Person1', 'Person3', 'Person2']}

answered Nov 13, 2017 at 15:43

jezrael

868k102 gold badges1.4k silver badges1.3k bronze badges

2 Comments

jezrael Over a year ago

You are welcome! Btw, I am a bit curious about timings, is possible check it?

jezrael Over a year ago

With your real data I think, because all solutions are nice ;)

Bharath M Shetty · Accepted Answer · 2017-11-13 15:47:33Z

3

Use a mask i.e

ndf = df.where(df.isnull(),df.apply(lambda x : x.index,1))
temp = ndf[ndf.notnull().sum(1)>=2]

   Person1  Person2  Person3
id                           
Mc  Person1  Person2  Person3
G   Person1  Person2     None

For a dictionary we can use

di = { key: value[pd.notnull(value)].tolist() for key,value in zip(temp.index,temp.values)}

{'G': ['Person1', 'Person2'], 'Mc': ['Person1', 'Person2', 'Person3']}

edited Nov 13, 2017 at 15:47

answered Nov 13, 2017 at 15:40

Bharath M Shetty

30.6k6 gold badges65 silver badges111 bronze badges

1 Comment

dliv Over a year ago

Thank you Bharath for another solution!

Bharath M Shetty · Accepted Answer · 2017-11-14 12:51:55Z

Late to the party, but I was curious to know if it was possible with more native Pandas features. I know it's already accepted but feel free to upvote if it adds another perspective :)

Got it down to two statements:

# Use dropna to limit the DataFrame to remove names with more than 2 `None` values
In[1]: basic_dict = df.dropna(thresh=2, axis=0).to_dict(orient="index")
Out[1]:
{'G': {'Person1': 'x', 'Person2': 'x', 'Person3': None},
 'Mc': {'Person1': 'x', 'Person2': 'x', 'Person3': 'x'}}

# Strip the dictionary to remove any remaining `None` values
In[2]:  { k:[i for i in v if v[i] == "x"] for k,v in basic_dict.items()}
Out[2]: {'G': ['Person1', 'Person2'], 'Mc': ['Person3', 'Person1', 'Person2']}

The returning list isn't sorted in the same order, but I was guessing that wasn't critical.

Collectives™ on Stack Overflow

Pandas dataframe get index and header when in multiple columns

4 Answers 4

2 Comments

2 Comments

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

2 Comments

1 Comment

Comments

Related