Nested list to list into new column

Question

data = {
    'date': ['2020-04-27', '2020-04-27', '2020-04-27'],
    'user': ['Steeve', 'Pam', 'Olive'],
    'mentions': ["['sport', 'basket']", "['politique']", "[]"],
    'reply_to': [
        "[{'user_id': '123', 'username': 'aaa'}, {'user_id': '234', 'username': 'bbb'}, {'user_id': '456', 'username': 'ccc'}]",
        "[{'user_id': '567', 'username': 'zzz'}, {'user_id': '458', 'username': 'vfd'}]",
        "[{'user_id': '666', 'username': 'ggg'}]"],
    'text': ['textfromSteeve', 'textfromPam', 'textfromOlive']
}

stack = pd.DataFrame(data, columns=['date', 'user','mentions','reply_to','text'])

From this dataframe, I'm trying to convert both mentions and reply_to columns into nested list. The goal is then to apply an pandas explode function to display one row for each number of mentions. For instance, I'd like 3 rows of user 'Pam' with one mention for each line (Steeve, Olive and Marc).

So far, I've done the following:

def nested_list(li):
    temp = []
    for elem in li:
        temp.append([elem]) 
    return temp
stack['mentions_nested= stack.mentions.apply(lambda x: nested_list(x))
stack['replies_nested= stack.reply_to.apply(lambda x: nested_list(x))

The problem is when there is only only one name (string) in the column. It splits each letter into a distinct list (ex: [[P], [a], [m]]).

Regarding to the reply_to column, where the dictionary's length is equal to 1, it returns something like this: [[id],[username]].

Do you guys have any idea on how I could do this?

FYI: I'm not going to apply the explode function on both mentions an reply_to columns on the meantime. This is going to be two different process.python

Please share the expected output.

Mayank Porwal
– Mayank Porwal

2020-04-26 10:48:03 +00:00
Commented Apr 26, 2020 at 10:48 — Mayank Porwal
– Mayank Porwal, Commented Apr 26, 2020 at 10:48

jezrael · Accepted Answer · 2020-04-26 11:01:37Z

I believe you need replace non list values to lists with map and isinstance:

for c in ['mentions','reply_to']:
    stack[c] = stack[c].map(lambda x: x if isinstance(x, list) else [x])

print (stack)
     user               mentions  \
0  Steeve                  [Pam]   
1     Pam  [Steeve, Olive, Marc]   
2   Olive            [Paul, Lou]   

                                            reply_to            text  
0               [{'id': '123', 'username': 'alpha'}]  textfromSteeve  
1  [{'id': '231', 'username': 'beta'}, {'id': '45...     textfromPam  
2                 [{'id': '789', 'username': 'olo'}]   textfromOlive

Then is possible create dictionaries in list comprehension with assign key from index values, pass to DataFrame and last use DataFrame.join for original, last use DataFrame.explode:

L = [dict(**{'idx':k}, **x) for 
     k, v in stack.pop('reply_to').items() 
     for x in v]

df = pd.DataFrame(L).join(stack, on='idx').explode('mentions').reset_index(drop=True)
print (df)
   idx    id username    user mentions            text
0    0   123    alpha  Steeve      Pam  textfromSteeve
1    1   231     beta     Pam   Steeve     textfromPam
2    1   231     beta     Pam    Olive     textfromPam
3    1   231     beta     Pam     Marc     textfromPam
4    1  4580    omega     Pam   Steeve     textfromPam
5    1  4580    omega     Pam    Olive     textfromPam
6    1  4580    omega     Pam     Marc     textfromPam
7    2   789      olo   Olive     Paul   textfromOlive
8    2   789      olo   Olive      Lou   textfromOlive

I'm afraid I'm getting an error with you solution "type object argument after ** must be a mapping, not str"...
@Grégoire de Kermel Can you omit first part solution, for c in ['mentions','reply_to']: stack[c] = stack[c].map(lambda x: x if isinstance(x, list) else [x]) and run solution?
I still get an TypeError: type object argument after ** must be a mapping, not str ...
I found a solution! I've simply apply an eval on my "mentions" column before doing your process and it works perfectly! Many thanks :) df['mentions'] = df.mentions.apply(lambda x: eval(x))

sammywemmy · Accepted Answer · 2020-04-28 22:50:27Z

use json normalize on the reply to column

from pandas import json_normalize

res = pd.concat([json_normalize(ent)
                .assign(index=ind)
                 for ind,ent
                in zip(stack.index,stack['reply_to'])
                ])

res

     id username    index
0   123     alpha   0
0   231     beta    1
1   4580    omega   1
0   789     olo     2

join back to the original dataframe, with some cleanups

  (stack
   .join(res.set_index('index'))
   .explode('mentions')
   .drop('reply_to',axis=1)
   )

    user    mentions    text         id    username
0   Steeve  Pam     textfromSteeve  123     alpha
1   Pam    Steeve   textfromPam     231     beta
1   Pam     Olive   textfromPam     231     beta
1   Pam     Marc    textfromPam     231     beta
1   Pam     Steeve  textfromPam     4580    omega
1   Pam     Olive   textfromPam     4580    omega
1   Pam     Marc    textfromPam     4580    omega
2   Olive   Paul    textfromOlive   789     olo
2   Olive   Lou     textfromOlive   789     olo

I'm sorry but I've update a wrong dataframe (I've since updated it). You're solution doesn't seems to be working with the update dataframe
what's the error message? also, kindly post in ur question ur expected output dataframe

Collectives™ on Stack Overflow

Nested list to list into new column

2 Answers 2

9 Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

9 Comments

2 Comments

Related