1

Simple example below using minmaxscaler, polyl features and linear regression classifier.

doing via pipeline:

pipeLine = make_pipeline(MinMaxScaler(),PolynomialFeatures(), LinearRegression())

pipeLine.fit(X_train,Y_train)
print(pipeLine.score(X_test,Y_test))
print(pipeLine.steps[2][1].intercept_)
print(pipeLine.steps[2][1].coef_)

0.4433729905419167
3.4067909278765605
[ 0.         -7.60868833  5.87162697]

doing manually:

X_trainScaled = MinMaxScaler().fit_transform(X_train)
X_trainScaledandPoly = PolynomialFeatures().fit_transform(X_trainScaled)

X_testScaled = MinMaxScaler().fit_transform(X_test)
X_testScaledandPoly = PolynomialFeatures().fit_transform(X_testScaled)

reg = LinearRegression()
reg.fit(X_trainScaledandPoly,Y_train)
print(reg.score(X_testScaledandPoly,Y_test))
print(reg.intercept_)
print(reg.coef_)
print(reg.intercept_ == pipeLine.steps[2][1].intercept_)
print(reg.coef_ == pipeLine.steps[2][1].coef_)

0.44099256691782807
3.4067909278765605
[ 0.         -7.60868833  5.87162697]
True
[ True  True  True]
3
  • Is is possible that X_test and X_train have different min/max values? Can you try it with a defined dataset and add it to your question? Commented Jul 27, 2019 at 9:34
  • You are not supposed to fit_transform twice. You are supposed to fit using the training data and then ONLY call transform for the test data. Commented Jul 27, 2019 at 13:47
  • Thanks guys :) I can see the error of my ways now :) Commented Jul 27, 2019 at 14:19

1 Answer 1

2

The problem lies in your manual steps, where you do the refitting of the Scaler using test data, you need to fit it on train data and use fitted instance on test data, see here for details: How to normalize the Train and Test data using MinMaxScaler sklearn and StandardScaler before and after splitting data

from sklearn.datasets import make_classification, make_regression
from sklearn.preprocessing import MinMaxScaler, PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

X, y = make_regression(n_features=3, n_samples=50, n_informative=1, noise=1)
X_train, X_test, Y_train, Y_test = train_test_split(X, y)

pipeLine = make_pipeline(MinMaxScaler(),PolynomialFeatures(), LinearRegression())

pipeLine.fit(X_train,Y_train)
print(pipeLine.score(X_test,Y_test))
print(pipeLine.steps[2][1].intercept_)
print(pipeLine.steps[2][1].coef_)

scaler = MinMaxScaler().fit(X_train)
X_trainScaled = scaler.transform(X_train)
X_trainScaledandPoly = PolynomialFeatures().fit_transform(X_trainScaled)


X_testScaled = scaler.transform(X_test)
X_testScaledandPoly = PolynomialFeatures().fit_transform(X_testScaled)

reg = LinearRegression()
reg.fit(X_trainScaledandPoly,Y_train)
print(reg.score(X_testScaledandPoly,Y_test))
print(reg.intercept_)
print(reg.coef_)
print(reg.intercept_ == pipeLine.steps[2][1].intercept_)
print(reg.coef_ == pipeLine.steps[2][1].coef_)
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.