Evening Chaps,
quick one, what's the best method to concatenate strings in a dataframe?
I have a CSV which is output of a form. True to if/else form fashion the outputs for any child questions has been output in new columns.
As the majority of the form are child questions I want to write a small script to do the following.
1, Drop Irrelevant Columns 2. Concatenate remaining columns by delimiter. (',') 3. Create new DF by adding in the merged columns and the irrelevant columns from step 1.
my attempt:
import pandas as pd
import os
enter code here
df = pd.read_csv('survey.csv')
df
Qual, Qual2, Qual3, Qual4, Qual5, Qual6
0 IT Digital NaN NaN NaN NaN
1 NaN NaN Maths NaN NaN NaN
df['Combined_Data'] = df.fillna('').astype(str).sum(axis=1)
df:
Qual, Qual2, Qual3, Qual4, Qual5, Qual6 Combined Data
0 IT Digital NaN NaN NaN NaN ITDigital
1 NaN NaN Maths NaN Algebra NaN MathsAlgebra
I'm unsure how to add a , in between each sum or if in fact sum is the correct way to do this..? probably not. but this is what I found after several google searches.
any help would be most appreciated.
df.fillna('').astype(str).apply(lambda x: ",".join(x), axis=1)?