2

I have a dataframe (df) in Python with a few features but I'm going to work with Age and Age_Mean columns.

In Age column, there are several null values. I would like to replace those null values with the same index from Age_Mean column.

Here is the code I used:

    for i in df:
        if df['Age'].isnull().iloc[i] == True:
            df['Age'].iloc[i] == df['Age_Mean'].iloc[i]

This is my error message:

KeyError: 'the label [Age] is not in the [index]'

Please let me know what is wrong with this code.

5
  • 1
    What error are you getting? - Just saying "I get error" doesn't help anyone. Commented Jun 20, 2019 at 9:42
  • Her is the error I get: KeyError: 'the label [Age] is not in the [index]' Commented Jun 20, 2019 at 9:51
  • Can you give some example data? And why aren't you just using df['Age'].fillna(df['Age_Mean'])? Commented Jun 20, 2019 at 9:55
  • Please provide a minimal reproducible example and edit your question including both the example and error. Commented Jun 20, 2019 at 9:57
  • @NilsWerner Thank you very much Nils. Yes I could use your code as well, but I tried to understand the issue with for-loop and thanks to Parthasarathy, I found out about my mistake. Commented Jun 20, 2019 at 11:54

1 Answer 1

2

The statement for i in df will iterate through the column name. Let's take an example to understand this better:

df = pd.DataFrame({"Age":np.array([2,3,np.nan,8,np.nan]),"Age_mean":np.array([2,5,9,2,1])})
df

so the data frame will look like this:

    Age Age_mean
0   2.0 2
1   3.0 5
2   NaN 9
3   8.0 2
4   NaN 1

Now lets see what the for loop will iterate over:

for i in df:
     print(i)

OUTPUT

Age
Age_mean

And now when you try to execute df['Age'].isnull().iloc[i] it is going to throw an error because the value of i will be Age in this case.

PROPOSED SOLUTION:

We can do this without a for loop as shown below:

nan_index = df['Age'].index[df['Age'].apply(np.isnan)]
df.loc[nan_index,"Age"] = df.loc[nan_index,"Age_mean"]

The first line is going to return the indices of the rows for which the value of Age is NaN. Once we know that we just to replace those with the value in the column Age_mean which is done by the second statement.

OUTPUT

    Age Age_mean
0   2.0 2
1   3.0 5
2   9.0 9
3   8.0 2
4   1.0 1
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.