2

I have a dataframe which looks like this:

d = {'id': ['Mc','Web','G','M','F'], 'Person1':['x','x','x',None,None],'Person2':['x',None,'x','x',None], 'Person3':['x',None, None,None, None]}

df = pd.DataFrame(d)
df.set_index('id', inplace=True)

    Person1 Person2 Person3
id                         
Mc        x       x       x
Web       x    None    None
G         x       x    None
M      None       x    None
F      None    None    None

How can I get the id-value and column header if an id appears with more than one person?. For example, the above data frame should give the following dictionary:

{'Mc':[Person1, Person2, Person3], 'G':[Person1, Person2]}

Any help would be very much appreciated.

4 Answers 4

4
df[df.notnull().sum(1)>1].stack().reset_index().\
     groupby('id')['level_1'].apply(list).to_dict()
Out[382]: {'G': ['Person1', 'Person2'], 'Mc': ['Person1', 'Person2', 'Person3']}
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your help, Wen! Accepting because your answer works and you were the first to provide me with a solution.
@dliv glad it help , have a nice day :-)
3

First filter and create dictionary and then get keys if values are not Nones:

d = df[df.count(1) > 1].to_dict(orient='index')
print (d)
{'G': {'Person1': 'x', 'Person3': None, 'Person2': 'x'}, 
'Mc': {'Person1': 'x', 'Person3': 'x', 'Person2': 'x'}}

d1 = {k:[k1 for k1, v1 in v.items() if pd.notnull(v1)] for k,v in d.items()}
print (d1)
{'G': ['Person1', 'Person2'], 'Mc': ['Person1', 'Person3', 'Person2']}

2 Comments

You are welcome! Btw, I am a bit curious about timings, is possible check it?
With your real data I think, because all solutions are nice ;)
3

Use a mask i.e

ndf = df.where(df.isnull(),df.apply(lambda x : x.index,1))
temp = ndf[ndf.notnull().sum(1)>=2]
   Person1  Person2  Person3
id                           
Mc  Person1  Person2  Person3
G   Person1  Person2     None

For a dictionary we can use

di = { key: value[pd.notnull(value)].tolist() for key,value in zip(temp.index,temp.values)}

{'G': ['Person1', 'Person2'], 'Mc': ['Person1', 'Person2', 'Person3']}

1 Comment

Thank you Bharath for another solution!
2

Late to the party, but I was curious to know if it was possible with more native Pandas features. I know it's already accepted but feel free to upvote if it adds another perspective :)

Got it down to two statements:

# Use dropna to limit the DataFrame to remove names with more than 2 `None` values
In[1]: basic_dict = df.dropna(thresh=2, axis=0).to_dict(orient="index")
Out[1]:
{'G': {'Person1': 'x', 'Person2': 'x', 'Person3': None},
 'Mc': {'Person1': 'x', 'Person2': 'x', 'Person3': 'x'}}

# Strip the dictionary to remove any remaining `None` values
In[2]:  { k:[i for i in v if v[i] == "x"] for k,v in basic_dict.items()}
Out[2]: {'G': ['Person1', 'Person2'], 'Mc': ['Person3', 'Person1', 'Person2']}

The returning list isn't sorted in the same order, but I was guessing that wasn't critical.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.