0

When reproducing this cross-validation example, I get for a 2x4 train matrix (xtrain) a len(b.get_support()) of 1 000 000. Does this mean 1 000 000 features have been created in the model? Or only 2, as the number of features that have an impact is 2. Thanks!

%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt

from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.cross_validation import cross_val_score, KFold
from sklearn.linear_model import LinearRegression
### create data
def hidden_model(x):
    #y is a linear combination of columns 5 and 10...
    result = x[:, 5] + x[:, 10]
    #... with a little noise
    result += np.random.normal(0, .005, result.shape)
    return result


def make_x(nobs):
    return np.random.uniform(0, 3, (nobs, 10 ** 6))

x = make_x(20)
y = hidden_model(x)

scores = []
clf = LinearRegression()

for train, test in KFold(len(y), n_folds=5):
    xtrain, xtest, ytrain, ytest = x[train], x[test], y[train], y[test]

    b = SelectKBest(f_regression, k=2)
    b.fit(xtrain,ytrain)
    xtrain = xtrain[:, b.get_support()] #get_support: get mask or integer index of selected features
    xtest = xtest[:, b.get_support()]
    print len(b.get_support())

    clf.fit(xtrain, ytrain)
    scores.append(clf.score(xtest, ytest))

    yp = clf.predict(xtest)
    plt.plot(yp, ytest, 'o')
    plt.plot(ytest, ytest, 'r-')

plt.xlabel('Predicted')
plt.ylabel('Observed')

print("CV Score (R_square) is", np.mean(scores))

1 Answer 1

3

It represents the mask that can be applied to your x to get the features that have been selected using the SelectKBest routine.

print x.shape
print b.get_support().shape
print np.bincount(b.get_support())

Outputs:

(20, 1000000)
(1000000,)
[999998      2]

Which shows you have 20 examples of 1000000 dimensional data, a boolean array of length 1000000 of which only two are ones.

Hope that helps!

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.