5

I have a Pandas Dataframe that look like this :

              tags   value
[tag1, tag2, tag3]       0
[tag2, tag3]            10
[tag1, tag3]            50
                       ...

On this Dataframe, I want to apply a function that, for each tags of each rows, will create a new row with a column 'tag', and a column 'related_tags'. Here is an example of what I am expecting :

 tag   value    related_tags
tag1       0    [tag2, tag3] 
tag2       0    [tag1, tag3] 
tag3       0    [tag1, tag2] 
tag2      10    [tag3]     
tag3      10    [tag2]    
tag1      50    [tag3]   
tag3      50    [tag1]

I am familiar with Spark DataFrames but not with Pandas, is there a simple way to achieve this ?

1

1 Answer 1

4

This is unnesting problem firstly , after explode the list columns tags, questions is more clear

newdf=unnesting(df,['tags']).reset_index()

newdf['related_tags']=newdf['index'].map(df.tags)

newdf['related_tags']=[list(set(y)-{x})for x , y in zip(newdf.tags,newdf.related_tags)]
newdf
Out[48]: 
   index  tags  value  related_tags
0      0  tag1      0  [tag2, tag3]
1      0  tag2      0  [tag3, tag1]
2      0  tag3      0  [tag2, tag1]
3      1  tag2     10        [tag3]
4      1  tag3     10        [tag2]

Data input

df=pd.DataFrame({'tags':[['tag1','tag2','tag3'],['tag2','tag3']],'value':[0,10]})

self-define function

def unnesting(df, explode):
    idx=df.index.repeat(df[explode[0]].str.len())
    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
    df1.index=idx
    return df1.join(df.drop(explode,1),how='left')
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.