0
data = {
    'date': ['2020-04-27', '2020-04-27', '2020-04-27'],
    'user': ['Steeve', 'Pam', 'Olive'],
    'mentions': ["['sport', 'basket']", "['politique']", "[]"],
    'reply_to': [
        "[{'user_id': '123', 'username': 'aaa'}, {'user_id': '234', 'username': 'bbb'}, {'user_id': '456', 'username': 'ccc'}]",
        "[{'user_id': '567', 'username': 'zzz'}, {'user_id': '458', 'username': 'vfd'}]",
        "[{'user_id': '666', 'username': 'ggg'}]"],
    'text': ['textfromSteeve', 'textfromPam', 'textfromOlive']
}

stack = pd.DataFrame(data, columns=['date', 'user','mentions','reply_to','text'])

From this dataframe, I'm trying to convert both mentions and reply_to columns into nested list. The goal is then to apply an pandas explode function to display one row for each number of mentions. For instance, I'd like 3 rows of user 'Pam' with one mention for each line (Steeve, Olive and Marc).

So far, I've done the following:

def nested_list(li):
    temp = []
    for elem in li:
        temp.append([elem]) 
    return temp
stack['mentions_nested= stack.mentions.apply(lambda x: nested_list(x))
stack['replies_nested= stack.reply_to.apply(lambda x: nested_list(x))

The problem is when there is only only one name (string) in the column. It splits each letter into a distinct list (ex: [[P], [a], [m]]).

Regarding to the reply_to column, where the dictionary's length is equal to 1, it returns something like this: [[id],[username]].

Do you guys have any idea on how I could do this?

FYI: I'm not going to apply the explode function on both mentions an reply_to columns on the meantime. This is going to be two different process.python

1
  • 1
    Please share the expected output. Commented Apr 26, 2020 at 10:48

2 Answers 2

1

I believe you need replace non list values to lists with map and isinstance:

for c in ['mentions','reply_to']:
    stack[c] = stack[c].map(lambda x: x if isinstance(x, list) else [x])

print (stack)
     user               mentions  \
0  Steeve                  [Pam]   
1     Pam  [Steeve, Olive, Marc]   
2   Olive            [Paul, Lou]   

                                            reply_to            text  
0               [{'id': '123', 'username': 'alpha'}]  textfromSteeve  
1  [{'id': '231', 'username': 'beta'}, {'id': '45...     textfromPam  
2                 [{'id': '789', 'username': 'olo'}]   textfromOlive  

Then is possible create dictionaries in list comprehension with assign key from index values, pass to DataFrame and last use DataFrame.join for original, last use DataFrame.explode:

L = [dict(**{'idx':k}, **x) for 
     k, v in stack.pop('reply_to').items() 
     for x in v]

df = pd.DataFrame(L).join(stack, on='idx').explode('mentions').reset_index(drop=True)
print (df)
   idx    id username    user mentions            text
0    0   123    alpha  Steeve      Pam  textfromSteeve
1    1   231     beta     Pam   Steeve     textfromPam
2    1   231     beta     Pam    Olive     textfromPam
3    1   231     beta     Pam     Marc     textfromPam
4    1  4580    omega     Pam   Steeve     textfromPam
5    1  4580    omega     Pam    Olive     textfromPam
6    1  4580    omega     Pam     Marc     textfromPam
7    2   789      olo   Olive     Paul   textfromOlive
8    2   789      olo   Olive      Lou   textfromOlive
Sign up to request clarification or add additional context in comments.

9 Comments

I'm afraid I'm getting an error with you solution "type object argument after ** must be a mapping, not str"...
I've upadted the dataframe, they was an error... sorry
@Grégoire de Kermel Can you omit first part solution, for c in ['mentions','reply_to']: stack[c] = stack[c].map(lambda x: x if isinstance(x, list) else [x]) and run solution?
I still get an TypeError: type object argument after ** must be a mapping, not str ...
I found a solution! I've simply apply an eval on my "mentions" column before doing your process and it works perfectly! Many thanks :) df['mentions'] = df.mentions.apply(lambda x: eval(x))
|
0

use json normalize on the reply to column

from pandas import json_normalize

res = pd.concat([json_normalize(ent)
                .assign(index=ind)
                 for ind,ent
                in zip(stack.index,stack['reply_to'])
                ])

res

     id username    index
0   123     alpha   0
0   231     beta    1
1   4580    omega   1
0   789     olo     2

join back to the original dataframe, with some cleanups

  (stack
   .join(res.set_index('index'))
   .explode('mentions')
   .drop('reply_to',axis=1)
   )

    user    mentions    text         id    username
0   Steeve  Pam     textfromSteeve  123     alpha
1   Pam    Steeve   textfromPam     231     beta
1   Pam     Olive   textfromPam     231     beta
1   Pam     Marc    textfromPam     231     beta
1   Pam     Steeve  textfromPam     4580    omega
1   Pam     Olive   textfromPam     4580    omega
1   Pam     Marc    textfromPam     4580    omega
2   Olive   Paul    textfromOlive   789     olo
2   Olive   Lou     textfromOlive   789     olo

2 Comments

I'm sorry but I've update a wrong dataframe (I've since updated it). You're solution doesn't seems to be working with the update dataframe
what's the error message? also, kindly post in ur question ur expected output dataframe

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.