2

I'm new to python, and trying to learn how to data analysis with it. I have a data frame in python (called "data"). I am looking to recode a variable, GEND, which has three values (1, 2, 3). Using pandas, I read in a csv file using pd.read_csv(). I am trying to replace all instances of "3" in the variable GEND to missing (NaN). However, I can't seem to find out how to do it. So far I've tried a for loop, which doesn't show an error, but doesn't change the variable information:

for value in data.GEND:
if value == 3:
    value = np.nan

I've also tried this, which doesn't show an error, but also doesn't do anything:

data.GEND.loc[3] = np.nan

and this, which works but changes the value of the ID variable to "3", but otherwise correctly changes the value of "3" in the GEND variable to NaN:

data.GEND.replace(to_replace=3, value = nan) 

What am I missing here? I'd also like to know how I can do the above but create a new column in the data frame that contains the new information (so I can keep the original values if I mess up).

1 Answer 1

4

You can use loc to replace the 3's:

df = pd.DataFrame({'GEND':[1,2,1,2,3,1,2,3,1,2,1,2,]})
df.loc[df.GEND == 3, 'GEND'] = np.NaN

    GEND
0   1
1   2
2   1
3   2
4   NaN
5   1
6   2
7   NaN
8   1
9   2
10  1
11  2

Also using where you can obtain the same result:

df.GEND = df.GEND.where(df.GEND != 3)
Sign up to request clarification or add additional context in comments.

4 Comments

That replaces the 3rd loc with NaN, print out what df.GEND.loc[3] is and you should see what it is doing.
@Daniel loc performs label indexing, so it returns just the row where the index is 3
Thanks, guys! This was super frustrating for me and you helped me a lot! The code works now!
The following code worked for me: data.loc[data.GEND == 3, 'GEND'] = np.NaN