2

I'm trying to elaborate three csv file and create only one file merging the useful data.

Now, I'm stuck on this problem:

I have two columns (SUFFIX and COD_METEL), with 1.5 Millions of rows, that I need to elaborate and create another column containing the results.

        SUFFIX    COD_METEL
0          CBR   CBR8901027
1          CBR   CBR8901028
2          CBR   CBR8904001
3          CBR   CBR8904002
4          CBR   CBR8904008
5          CBR   CBR8904027
6          CBR   CBR8904039
7          THO  THO96666290
8          THO  THO96666294
9          THO  THO96666298
10         THO  THO96666302
11         THO  THO96666322
12         THO  THO96666326
13          ZV   ZV111900NI
14          ZV   ZV111910NI
15          ZX    ZX2021.AC
16          ZX    ZX2021.AC
17          ZX    ZX6066.AC
18          ZX    ZX6111.AC
19          ZX    ZX6111.AC
20          ZX    ZX6380.AC
21          ZX       ZX9030
22          ZX       ZX9030
23          ZX       ZX9030
24          ZZ   ZZ00012565

Here I need to "subtract" the SUFFIX value to the COD_METEL, like this:

df["RESULT"] = df["COD_METEL"] - df["SUFFIX"]

        SUFFIX    COD_METEL     RESULT
0          CBR   CBR8901027    8901027
1          CBR   CBR8901028    8901028
2          CBR   CBR8904001    8904001

I know that is not possible to use the "-" operator, so I'm asking you some tips to figure out this problem, and replace all the value in a faster way.

I have already tried to do some tests:

replaceList = list(set(df["SUFFIX"]))
for to_replace in replaceList:
    df["RESULT"] = df["COD_METEL"].str.replace(to_replace,"")

2 Answers 2

1

You can try list comprehension if no missing values:

df['new'] = [j.replace(i, '') for i, j in zip(df['SUFFIX'], df['COD_METEL'])]
print (df)
   SUFFIX    COD_METEL       new
0     CBR   CBR8901027   8901027
1     CBR   CBR8901028   8901028
2     CBR   CBR8904001   8904001
3     CBR   CBR8904002   8904002
4     CBR   CBR8904008   8904008
5     CBR   CBR8904027   8904027
6     CBR   CBR8904039   8904039
7     THO  THO96666290  96666290
8     THO  THO96666294  96666294
9     THO  THO96666298  96666298
10    THO  THO96666302  96666302
11    THO  THO96666322  96666322
12    THO  THO96666326  96666326
13     ZV   ZV111900NI  111900NI
14     ZV   ZV111910NI  111910NI
15     ZX    ZX2021.AC   2021.AC
16     ZX    ZX2021.AC   2021.AC
17     ZX    ZX6066.AC   6066.AC
18     ZX    ZX6111.AC   6111.AC
19     ZX    ZX6111.AC   6111.AC
20     ZX    ZX6380.AC   6380.AC
21     ZX       ZX9030      9030
22     ZX       ZX9030      9030
23     ZX       ZX9030      9030
24     ZZ   ZZ00012565  00012565

Performance:

#[250000 rows x 2 columns]
df = pd.concat([df] * 10000, ignore_index=True)
#print (df)

In [289]: %timeit df['RESULT'] = df.apply(lambda x: x['COD_METEL'].replace(x['SUFFIX'], ''), axis=1)
5.05 s ± 347 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [290]: %timeit df['new'] = [j.replace(i, '') for i, j in zip(df['SUFFIX'], df['COD_METEL'])]
98.7 ms ± 8.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Sign up to request clarification or add additional context in comments.

5 Comments

I think this is the best solution for me, I have added in the for loop the condition: if I is str to solve the problem where the value is 0 (that happened some times). I'm getting one error that I can't understand: Length of values does not match length of index. The current value in i, j are: i = 'ZZ' and j = 'ZZ00012565'
@CarloZanocco - yes, or use df['new'] = [str(j).replace(str(i), '') for i, j in zip(df['SUFFIX'], df['COD_METEL'])
@CarloZanocco - Or use df['new'] = [j.replace(i, '') if isinstance(j, str) else '' for i, j in zip(df['SUFFIX'], df['COD_METEL'])]
because need to return same number of lists
Thanks, the last one comment solved all the problems: df['new'] = [j.replace(i, '') if isinstance(i, str) else '' for i, j in zip(df['SUFFIX'], df['COD_METEL'])]
1

Another approach would be:

df['RESULT'] = df.apply(lambda x: x['COD_METEL'].replace(x['SUFFIX'], ''), axis=1)
df

   SUFFIX    COD_METEL    RESULT
0     CBR   CBR8901027   8901027
1     CBR   CBR8901028   8901028
2     CBR   CBR8904001   8904001
3     CBR   CBR8904002   8904002
4     CBR   CBR8904008   8904008
5     CBR   CBR8904027   8904027
6     CBR   CBR8904039   8904039
7     THO  THO96666290  96666290
8     THO  THO96666294  96666294
9     THO  THO96666298  96666298
10    THO  THO96666302  96666302
11    THO  THO96666322  96666322
12    THO  THO96666326  96666326
13     ZV   ZV111900NI  111900NI
14     ZV   ZV111910NI  111910NI
15     ZX    ZX2021.AC   2021.AC
16     ZX    ZX2021.AC   2021.AC
17     ZX    ZX6066.AC   6066.AC
18     ZX    ZX6111.AC   6111.AC
19     ZX    ZX6111.AC   6111.AC
20     ZX    ZX6380.AC   6380.AC
21     ZX       ZX9030      9030
22     ZX       ZX9030      9030
23     ZX       ZX9030      9030
24     ZZ   ZZ00012565  00012565

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.