1

Evening Chaps,

quick one, what's the best method to concatenate strings in a dataframe?

I have a CSV which is output of a form. True to if/else form fashion the outputs for any child questions has been output in new columns.

As the majority of the form are child questions I want to write a small script to do the following.

1, Drop Irrelevant Columns 2. Concatenate remaining columns by delimiter. (',') 3. Create new DF by adding in the merged columns and the irrelevant columns from step 1.

my attempt:

import pandas as pd
import os
enter code here
df = pd.read_csv('survey.csv')

df
  Qual, Qual2,  Qual3, Qual4, Qual5, Qual6
0 IT    Digital NaN    NaN    NaN    NaN
1 NaN   NaN     Maths  NaN    NaN    NaN

df['Combined_Data'] = df.fillna('').astype(str).sum(axis=1)

df:
  Qual, Qual2,  Qual3, Qual4, Qual5,  Qual6 Combined Data
0 IT    Digital NaN    NaN    NaN     NaN   ITDigital
1 NaN   NaN     Maths  NaN    Algebra NaN   MathsAlgebra

I'm unsure how to add a , in between each sum or if in fact sum is the correct way to do this..? probably not. but this is what I found after several google searches.

any help would be most appreciated.

1
  • df.fillna('').astype(str).apply(lambda x: ",".join(x), axis=1) ? Commented Jul 5, 2018 at 11:28

1 Answer 1

3

Use apply with dropna:

df['Combined_Data'] = df.apply(lambda x: ', '.join(x.dropna()), axis=1)
print (df)
  Qual,   Qual2, Qual3,  Qual4,  Qual5,    Qual6   Combined_Data
0    IT  Digital    NaN     NaN     NaN      NaN     IT, Digital
1   NaN      NaN  Maths     NaN     NaN  Algebra  Maths, Algebra
Sign up to request clarification or add additional context in comments.

1 Comment

as always, you come to my rescue sir. Green tick from me!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.