Python Pandas Replace String in Column within For Loop

Question

I am trying to concatnate all files in the file list file_list:

result = pd.concat([pd.read_csv(f).set_index(['a', 'b', 'c']) for f in file_list])

The challenge is that, I would like to replace string 'xyz' with nothing in column[b] before set_index. How can I achieve this in the same line?

jezrael · Accepted Answer · 2017-11-27 06:25:06Z

1

I believe you need replace with nested dict:

dfs=[pd.read_csv(f).replace({'b':{'xyz':''}}).set_index(['a', 'b', 'c']) for f in file_list]
result = pd.concat(dfs)

Or if xyz strings are not in columns a and c is possible create MultiIndex and then replace all xyz:

dfs = [pd.read_csv(f, index_col=['a','b','c']).rename({'xyz':''}) for f in file_list]
result = pd.concat(dfs)

Last if nothing is NaN only use {'xyz':np.nan} instead {'xyz':''}

EDIT by comment:

For replace by regex:

dfs= [pd.read_csv(f).replace({'b':{'xyz*':''}}, regex=True).set_index(['a', 'b', 'c']) for f in file_list]
result = pd.concat(dfs)

edited Nov 27, 2017 at 6:25

answered Nov 26, 2017 at 13:13

jezrael

867k102 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

lovechillcool Over a year ago

Just to add, I used regular expression dfs=[pd.read_csv(f).replace({'b':{'xyz*':''}}, regex=True).

jezrael Over a year ago

I add it to answer too.

Collectives™ on Stack Overflow

Python Pandas Replace String in Column within For Loop

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related