0

I am looking to create a loop in python that will concatenate multiple rows of strings together. I have created the table that I have now listed as "Before" and then the table I am trying to create "After". Any thoughts on how to do this? I am currently using the following code to get just one string but I need to be able to loop the entire data frame:

df.str.cat(sep='')

Before:

Text       |    Channel  |  Destination   | Amount  | Total
string1           NaN           NaN           NaN      NaN
string2           DKI           US             34       5   
string3           NaN           NaN           NaN      NaN
string4           DKI           CA             39       20   

After:

Text           |    Channel  |  Destination   | Amount  | Total
string1string2        DKI           US            34       5
string3string4        DKI           CA            39       20   
5
  • 3
    Please show your current attempts, and clarify the logic by which you want to concatenate the strings (why do All and purpose go together, for instance?) Commented Jun 27, 2018 at 14:00
  • @sacul I am trying to just concatenate strings. I have updated the tables Commented Jun 27, 2018 at 14:02
  • Possible duplicate of How to concatenate values of all rows in a dataframe into a single row without altering the columns? Commented Jun 27, 2018 at 14:03
  • But what tells us that string1 and string2 go together? Commented Jun 27, 2018 at 14:03
  • 1
    @sacul The way to determine string1 and string2 go together is concatenating everything from the NaN down to where the first number is. Then especially restarting the concatenate. Commented Jun 27, 2018 at 14:05

1 Answer 1

2

Create helper Series by shift, check non NaNs by notna and create groups by cumsum.

Then aggregate dy dict of functions, remove index name and for same columns order add reindex:

a = df['total'].shift().notna().cumsum()
#for oldier pandas versions
#a = df['total'].shift().notnull().cumsum()
d = {'row':'first', 'total':'last', 'Text':''.join}

df = df.groupby(a).agg(d).rename_axis(None).reindex(columns=df.columns)
print (df)
   row            Text  total
0    1  string1string2    3.0
1    3  string3string4    1.0
Sign up to request clarification or add additional context in comments.

5 Comments

@ScottBoston - I test it with multiple NaNs
Gotcha. I see. I knew there was a reason you're doing it that way. Thank you.
@jezrael What is I have multiple metrics to group and then also multiple text columns?
@jumpman23 - Then change aggregate dictionary like d = { 'Text':''.join, 'Channel':'last', 'Destination':'last', 'Amount':'last', 'Total':'last'}
@jezrael Doesn't a in your code need to be adjusted as well?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.