2

I am working with pandas and a rather large excel document. My goal is to find and replace particular characters in a string and replace them with nothing, essentially removing the characters. The strings are in a particular column. Below you will see the code that I have created to find and replace, however python is not giving me an error message, and when I checked the saved file nothing has changed. What am I doing wrong?

import pandas as pd

df1 = pd.read_csv('2020.csv')

(df1.loc[(df1['SKU Code'].str.contains ('-DG'))])

dfDGremoved = (df1.loc[(df1['SKU Code'].str.contains('-DG'))].replace('-DG',''))

dfDGremoved.to_csv('2020DRAFT.csv')
2
  • 1
    Why check to see if the string contains what you're replacing. Just replace it first. Does this not work: df1['SKU Code'] = df1['SKU Code'].replace('-DG', ''). and then just df1.to_csv('2020DRAFT.csv') Commented Mar 3, 2020 at 20:12
  • 1
    The line (df1.loc[(df1['SKU Code'].str.contains ('-DG'))]) doesn't have any effect. Commented Mar 3, 2020 at 20:27

2 Answers 2

1

Your code is a bit overengineered, Python's replace method ignores strings which do not contain the substring you want to replace, so the contains call is unnecessary. Creating a second dataframe is also unnecessary, pandas can deal with in-place substitutions.

To achieve the result you want, you can use a map, which applies a function to every element in a Series (which a single column from a DataFrame is), combined to a lambda function:

df1 = pd.read_csv('2020.csv')
df1['SKU Code'] = df1['SKU Code'].map(lambda x: x.replace('-DG', '')
df1.to_csv('2020DRAFT.csv')

Unpacking this a bit:

df1['SKU Code'] = df1['SKU Code'].map(lambda x: x.replace('-DG', '')
  |                     |          |         └─ Create a nameless function which 
  |                     |          |            takes a string and removes '-DG'
  |                     |          |            from it 
  |                     |          |
  |                     |          └─ ...and run this function on every element...
  |                     |
  |                     └─ ... of the 'SKU Code' column in df1...
  |
  └── ... Then store the results in that same column
Sign up to request clarification or add additional context in comments.

Comments

1

You can use pandas.Series.str.replace(). It performs regex replace.

dfDGremoved = df1.copy()
dfDGremoved['SKU Code'] = dfDGremoved['SKU Code'].str.replace('-DG','')
dfDGremoved.to_csv('2020DRAFT.csv')

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.