Skip to main content
Stack Overflow is like an encyclopedia, so we prefer to omit these types of phrases. It is assumed that everyone here is trying to be helpful.
Source Link
Sunderam Dubey
  • 8.8k
  • 12
  • 25
  • 43

The statement for i in df will iterate through the column name. Let's take an example to understand this better:

df = pd.DataFrame({"Age":np.array([2,3,np.nan,8,np.nan]),"Age_mean":np.array([2,5,9,2,1])})
df

so the data frame will look like this:

    Age Age_mean
0   2.0 2
1   3.0 5
2   NaN 9
3   8.0 2
4   NaN 1

Now lets see what the for loop will iterate over:

for i in df:
     print(i)

OUTPUT

Age
Age_mean

And now when you try to execute df['Age'].isnull().iloc[i] it is going to throw an error because the value of i will be Age in this case.

PROPOSED SOLUTION:

We can do this without a for loop as shown below:

nan_index = df['Age'].index[df['Age'].apply(np.isnan)]
df.loc[nan_index,"Age"] = df.loc[nan_index,"Age_mean"]

The first line is going to return the indices of the rows for which the value of Age is NaN. Once we know that we just to replace those with the value in the column Age_mean which is done by the second statement.

OUTPUT

    Age Age_mean
0   2.0 2
1   3.0 5
2   9.0 9
3   8.0 2
4   1.0 1

Hope this helps!

The statement for i in df will iterate through the column name. Let's take an example to understand this better:

df = pd.DataFrame({"Age":np.array([2,3,np.nan,8,np.nan]),"Age_mean":np.array([2,5,9,2,1])})
df

so the data frame will look like this:

    Age Age_mean
0   2.0 2
1   3.0 5
2   NaN 9
3   8.0 2
4   NaN 1

Now lets see what the for loop will iterate over:

for i in df:
     print(i)

OUTPUT

Age
Age_mean

And now when you try to execute df['Age'].isnull().iloc[i] it is going to throw an error because the value of i will be Age in this case.

PROPOSED SOLUTION:

We can do this without a for loop as shown below:

nan_index = df['Age'].index[df['Age'].apply(np.isnan)]
df.loc[nan_index,"Age"] = df.loc[nan_index,"Age_mean"]

The first line is going to return the indices of the rows for which the value of Age is NaN. Once we know that we just to replace those with the value in the column Age_mean which is done by the second statement.

OUTPUT

    Age Age_mean
0   2.0 2
1   3.0 5
2   9.0 9
3   8.0 2
4   1.0 1

Hope this helps!

The statement for i in df will iterate through the column name. Let's take an example to understand this better:

df = pd.DataFrame({"Age":np.array([2,3,np.nan,8,np.nan]),"Age_mean":np.array([2,5,9,2,1])})
df

so the data frame will look like this:

    Age Age_mean
0   2.0 2
1   3.0 5
2   NaN 9
3   8.0 2
4   NaN 1

Now lets see what the for loop will iterate over:

for i in df:
     print(i)

OUTPUT

Age
Age_mean

And now when you try to execute df['Age'].isnull().iloc[i] it is going to throw an error because the value of i will be Age in this case.

PROPOSED SOLUTION:

We can do this without a for loop as shown below:

nan_index = df['Age'].index[df['Age'].apply(np.isnan)]
df.loc[nan_index,"Age"] = df.loc[nan_index,"Age_mean"]

The first line is going to return the indices of the rows for which the value of Age is NaN. Once we know that we just to replace those with the value in the column Age_mean which is done by the second statement.

OUTPUT

    Age Age_mean
0   2.0 2
1   3.0 5
2   9.0 9
3   8.0 2
4   1.0 1
Source Link

The statement for i in df will iterate through the column name. Let's take an example to understand this better:

df = pd.DataFrame({"Age":np.array([2,3,np.nan,8,np.nan]),"Age_mean":np.array([2,5,9,2,1])})
df

so the data frame will look like this:

    Age Age_mean
0   2.0 2
1   3.0 5
2   NaN 9
3   8.0 2
4   NaN 1

Now lets see what the for loop will iterate over:

for i in df:
     print(i)

OUTPUT

Age
Age_mean

And now when you try to execute df['Age'].isnull().iloc[i] it is going to throw an error because the value of i will be Age in this case.

PROPOSED SOLUTION:

We can do this without a for loop as shown below:

nan_index = df['Age'].index[df['Age'].apply(np.isnan)]
df.loc[nan_index,"Age"] = df.loc[nan_index,"Age_mean"]

The first line is going to return the indices of the rows for which the value of Age is NaN. Once we know that we just to replace those with the value in the column Age_mean which is done by the second statement.

OUTPUT

    Age Age_mean
0   2.0 2
1   3.0 5
2   9.0 9
3   8.0 2
4   1.0 1

Hope this helps!