2

I have a pandas dataframe where the Items columns is a list.

cust_id   Items
100     ['item1','item2','item3']
101     ['item5','item8','item9']
102     ['item2','item4']

I want to convert the above dataframe to the below format.

cust_id  Items
100     item1 item2 item3
101     item5 item8 item9
102     item2 item4

I tried using the pandas built in string replace function put its returning the original column without actually performing the string replace operation.

df['Items']=(df['Items'].astype(str)).replace({"['":"", "', '":" ", "']":"" },method='string')

Please advise

Update:

I used the below code to create the original dataframe.

df=df1.groupby(['cust_id'])['Items'].apply(list).reset_index()
4
  • Sorry are the elements lists or a string of a list? Can you post code to construct your df to avoid ambiguity Commented Sep 23, 2015 at 12:08
  • Why not just use an apply on the column, and do something like lambda lst: ' '.join(lst) Commented Sep 23, 2015 at 12:10
  • @EdChum I have added the code to reconstruct my original df. Commented Sep 23, 2015 at 13:08
  • @Brian This is new info for me, i will check it out. Commented Sep 23, 2015 at 13:09

1 Answer 1

4

If the elements are really list , then you can use str.join() on the list along with series.apply method . Example -

In [159]: df = pd.DataFrame([[100,['item1','item2','item3']],[101,['item5','item8','item9']],[102,['item2','item4']]],columns=['cust_id','Items'])

In [160]: df
Out[160]:
   cust_id                  Items
0      100  [item1, item2, item3]
1      101  [item5, item8, item9]
2      102         [item2, item4]

In [161]: df['Items'] = df['Items'].apply(' '.join)

In [162]: df
Out[162]:
   cust_id                Items
0      100    item1 item2 item3
1      101    item5 item8 item9
2      102          item2 item4
Sign up to request clarification or add additional context in comments.

7 Comments

You can just use df['Items'].apply(' '.join) here`
what does ' ' mean in the apply function?
@mrcet007 its the space, which is being used inbetween the different elements.
Thanks! Can someone also explain why pd.replace doesnt work 7 what is wrong with the code i tried?
because series.replace() is for replacing whole values, what you actually wanted to use was series.str.replace() . but there you cannot use dict, you would have to provide a regular expression or multiple .str.replace() .
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.