3

I have a pandas dataframe and wanted to concatenate two columns, while keeping all other columns within the dataframe the same. I tried the following based on the documentation:

df['Code2']= df['Code'] + df['Period']

Yet the result seems to almost work for some rows. While in other rows it doesn't work at all.

See the result below in column "Code2".

+---------+--------+---------+
|  Code   | Period |  Code2  |
+---------+--------+---------+
| 1000000 |   2017 | 1002017 |
| 1100000 |   2017 | 1102017 |
| 1101000 |   2017 | 1103017 |
| 1101100 |   2017 | 1103117 |
| 1101110 |   2017 | 1103127 |
+---------+--------+---------+

Note that the values in column 'Period' are not all equal to 2017. They are only so in the extract above.

The desired result would be the following:

+---------+--------+--------------+
|  Code   | Period |    Code2     |
+---------+--------+--------------+
| 1000000 |   2017 | 1000000_2017 |
| 1100000 |   2017 | 1100000_2017 |
| 1101000 |   2017 | 1101000_2017 |
| 1101100 |   2017 | 1101100_2017 |
| 1101110 |   2017 | 1101110_2017 |
+---------+--------+--------------+
2
  • 2
    The two are here integers, not strings, hence you add the two together. Commented Jan 6, 2020 at 20:32
  • @WillemVanOnsem. Hah, wow. Makes sense. How can I join them without adding them? Commented Jan 6, 2020 at 20:33

2 Answers 2

9

You are here adding up the numbers of the two columns. By converting these to strings, you can concatenate these, for example with:

df['Code2'] = df['Code'].astype(str) + df['Period'].astype(str)

This then yields:

>>> df
      Code  Period
0  1000000    2017
1  1100000    2017
2  1101000    2017
3  1101100    2017
4  1101110    2017
>>> df['Code2'] = df['Code'].astype(str) + df['Period'].astype(str)
>>> df
      Code  Period        Code2
0  1000000    2017  10000002017
1  1100000    2017  11000002017
2  1101000    2017  11010002017
3  1101100    2017  11011002017
4  1101110    2017  11011102017

Or if you want to separte this with an underscore:

df['Code2'] = df['Code'].astype(str) + '_' + df['Period'].astype(str)

which gives us:

>>> df['Code2'] = df['Code'].astype(str) + '_' + df['Period'].astype(str)
>>> df
      Code  Period         Code2
0  1000000    2017  1000000_2017
1  1100000    2017  1100000_2017
2  1101000    2017  1101000_2017
3  1101100    2017  1101100_2017
4  1101110    2017  1101110_2017
Sign up to request clarification or add additional context in comments.

Comments

7

If you have more than two columns, a good solution here is agg with str.join. Convert your integer columns to strings so as to concatenate them (not add them arithmetically).

df[['Code', 'Period']].astype(str).agg('_'.join, axis=1)

0    1000000_2017
1    1100000_2017
2    1101000_2017
3    1101100_2017
4    1101110_2017
dtype: object

For your two column data, this works fine too:

df['Code2'] = df.astype(str).agg('_'.join, axis=1)
df

      Code  Period         Code2
0  1000000    2017  1000000_2017
1  1100000    2017  1100000_2017
2  1101000    2017  1101000_2017
3  1101100    2017  1101100_2017
4  1101110    2017  1101110_2017

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.