2

I am trying to concatnate all files in the file list file_list:

result = pd.concat([pd.read_csv(f).set_index(['a', 'b', 'c']) for f in file_list])

The challenge is that, I would like to replace string 'xyz' with nothing in column[b] before set_index. How can I achieve this in the same line?

1 Answer 1

1

I believe you need replace with nested dict:

dfs=[pd.read_csv(f).replace({'b':{'xyz':''}}).set_index(['a', 'b', 'c']) for f in file_list]
result = pd.concat(dfs)

Or if xyz strings are not in columns a and c is possible create MultiIndex and then replace all xyz:

dfs = [pd.read_csv(f, index_col=['a','b','c']).rename({'xyz':''}) for f in file_list]
result = pd.concat(dfs)

Last if nothing is NaN only use {'xyz':np.nan} instead {'xyz':''}

EDIT by comment:

For replace by regex:

dfs= [pd.read_csv(f).replace({'b':{'xyz*':''}}, regex=True).set_index(['a', 'b', 'c']) for f in file_list]
result = pd.concat(dfs)
Sign up to request clarification or add additional context in comments.

2 Comments

Just to add, I used regular expression dfs=[pd.read_csv(f).replace({'b':{'xyz*':''}}, regex=True).
I add it to answer too.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.