How Do I impute missing values using pandas?

Question

I am trying to impute missing values as the mean of other values in the column; however, my code is having no effect. Does anyone know what I may be doing wrong? Thanks!

My code:

  from sklearn.preprocessing import Imputer
    imputer = Imputer(missing_values ='NaN', strategy = 
    'mean', axis = 0)
    imputer = imputer.fit(x[:, 1:3])
    x[:, 1:3] = imputer.transform(x[:, 1:3])
    print(dataset)

Output

Country   Age   Salary Purchased
0   France  44.0  72000.0        No
1    Spain  27.0  48000.0       Yes
2  Germany  30.0  54000.0        No
3    Spain  38.0  61000.0        No
4  Germany  40.0      NaN       Yes
5   France  35.0  58000.0       Yes
6    Spain   NaN  52000.0        No
7   France  48.0  79000.0       Yes
8  Germany  50.0  83000.0        No
9   France  37.0  67000.0       Yes

YOLO · Accepted Answer · 2018-12-28 19:32:11Z

You can do the following, let's say df is your dataset:

from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values ='NaN', strategy = 'mean', axis = 0)

df[['Age','Salary']]=imputer.fit_transform(df[['Age','Salary']])

print(df)

   Country        Age        Salary Purchased
0   France  44.000000  72000.000000        No
1    Spain  27.000000  48000.000000       Yes
2  Germany  30.000000  54000.000000        No
3    Spain  38.000000  61000.000000        No
4  Germany  40.000000  63777.777778       Yes
5   France  35.000000  58000.000000       Yes
6    Spain  38.777778  52000.000000        No
7   France  48.000000  79000.000000       Yes
8  Germany  50.000000  83000.000000        No
9   France  37.000000  67000.000000       Yes

In new versions of sklearn use from sklearn.impute import SimpleImputer.

Danielle M. · Accepted Answer · 2018-12-28 19:21:01Z

1

You're assigning an Imputer object to the variable imputer:

imputer = Imputer(missing_values ='NaN', strategy = 'mean', axis = 0)

You then call the fit() function on your Imputer object, and then the transform() function.

Then you print the dataset variable, which I'm not sure where it comes from. Did you mean to print the Imputer object, or the result of one of those calls instead?

answered Dec 28, 2018 at 19:21

Danielle M.

3,7001 gold badge16 silver badges33 bronze badges

1 Comment

M. Jole Over a year ago

Hey Danielle! so the dataset variable was created earlier in my code: dataset = pd.read_csv('Data1.csv'); I printed it in order to see whether or not the mean value was imputed in the age and salary columns to fill in for the NaN values. Printing it led to the output seen below my code. Upon printing the dataset, I saw that the NaN values were not replaced by the correct values, leading me to believe that my code had no effect.

Collectives™ on Stack Overflow

How Do I impute missing values using pandas?

2 Answers 2

1 Comment

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Related