Adding rows to a DataFrame column that doesn't exist yet

Question

I want to split a string that I have in a specific column of a DataFrame, get the numbers from the two new series, and assign the values to four new columns.

Before any modification the "Score" column on Saison looks like this:

0    \n3:2 (1:1) \n
1    \n0:2 (0:2) \n
2    \n1:1 (1:0) \n
3    \n1:1 (1:1) \n
4    \n2:0 (2:0) \n

The output that I want is this:

  Tore_Heim Tore_Auswärts Tore_Heim_HZ Tore_Auswärts_HZ
0         3             2            1                1
1         0             2            0                2
2         1             1            1                0
3         1             1            1                1
4         2             0            2                0

I have found a solution using list comprehension like this:

scores["Tore_Heim"] = pd.DataFrame([re.findall("\d+", scores[0][i]) for i in range(len(scores))]).loc[:, 0]
scores["Tore_Auswärts"] = pd.DataFrame([re.findall("\d+", scores[0][i]) for i in range(len(scores))]).loc[:, 1]
scores["Tore_Heim_HZ"] = pd.DataFrame([re.findall("\d+", scores[1][i]) for i in range(len(scores))]).loc[:, 0]
scores["Tore_Auswärts_HZ"] = pd.DataFrame([re.findall("\d+", scores[1][i]) for i in range(len(scores))]).loc[:, 1]

A second question is whether line 2 and 3 could be combined into one.

Actually, my solution has the problem that it assigns a list of strings to the column, not integers. — iuvbio
– iuvbio, Commented Nov 24, 2017 at 0:54

cs95 · Accepted Answer · 2017-11-24 00:52:29Z

1

You can use str.extractall + unstack:

df
              Col
0  \n3:2 (1:1) \n
1  \n0:2 (0:2) \n
2  \n1:1 (1:0) \n
3  \n1:1 (1:1) \n
4  \n2:0 (2:0) \n

v = df.Col.str.extractall('(\d+)', flags=re.M).unstack()
v.columns = ['Tore_Heim', 'Tore_Auswärts', 'Tore_Heim_HZ', 'Tore_Auswärts_HZ']
v

  Tore_Heim Tore_Auswärts Tore_Heim_HZ Tore_Auswärts_HZ
0         3             2            1                1
1         0             2            0                2
2         1             1            1                0
3         1             1            1                1
4         2             0            2                0

To convert to numeric type, apply pd.to_numeric accross columns -

v = v.apply(pd.to_numeric, errors='coerce')

Or, perform an astype conversion -

v = v.astype(float) # .astype(int) will work if you don't have NaNs in your data

edited Nov 24, 2017 at 0:52

answered Nov 24, 2017 at 0:41

cs95

406k106 gold badges744 silver badges795 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

iuvbio Over a year ago

That's great thanks a lot! So much shorter. Two questions: what does flag=re.M do and how do I get the output as integers instead of string?

cs95 Over a year ago

@iuvbio Multiline match, because you seem to have newlines in your data, so the flag is needed to match across multiple newlines. Also, I've already addressed that with a subsequent edit.

Collectives™ on Stack Overflow

Adding rows to a DataFrame column that doesn't exist yet

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related