data = {
'date': ['2020-04-27', '2020-04-27', '2020-04-27'],
'user': ['Steeve', 'Pam', 'Olive'],
'mentions': ["['sport', 'basket']", "['politique']", "[]"],
'reply_to': [
"[{'user_id': '123', 'username': 'aaa'}, {'user_id': '234', 'username': 'bbb'}, {'user_id': '456', 'username': 'ccc'}]",
"[{'user_id': '567', 'username': 'zzz'}, {'user_id': '458', 'username': 'vfd'}]",
"[{'user_id': '666', 'username': 'ggg'}]"],
'text': ['textfromSteeve', 'textfromPam', 'textfromOlive']
}
stack = pd.DataFrame(data, columns=['date', 'user','mentions','reply_to','text'])
From this dataframe, I'm trying to convert both mentions and reply_to columns into nested list. The goal is then to apply an pandas explode function to display one row for each number of mentions. For instance, I'd like 3 rows of user 'Pam' with one mention for each line (Steeve, Olive and Marc).
So far, I've done the following:
def nested_list(li):
temp = []
for elem in li:
temp.append([elem])
return temp
stack['mentions_nested= stack.mentions.apply(lambda x: nested_list(x))
stack['replies_nested= stack.reply_to.apply(lambda x: nested_list(x))
The problem is when there is only only one name (string) in the column. It splits each letter into a distinct list (ex: [[P], [a], [m]]).
Regarding to the reply_to column, where the dictionary's length is equal to 1, it returns something like this: [[id],[username]].
Do you guys have any idea on how I could do this?
FYI: I'm not going to apply the explode function on both mentions an reply_to columns on the meantime. This is going to be two different process.python