1

I split the data into the test and training set without using train_test_split
My Function:

def split(X, y):
    arr_rand = np.random.rand(X.shape[0])
    split = arr_rand < np.percentile(arr_rand, 75)
    X_train = X[split]
    y_train = y[split]
    X_test =  X[~split]
    y_test = y[~split]

    #print (len(X_Train)), (len(y_Train)), (len(X_Test)), (len(y_Test))
    return X_train, y_train, X_test, y_test

My problem is, when I output X_train I receive info that it has 76 rows x 8 columns.
However while printing X_test this info is missing. This is how it looks like. My df is a csv file: enter image description here

I needed to split it for X,y labels which I did with such approach:
X, y = df.iloc[:,0:8], df.iloc[:,8:9]
And later X_train, y_train, X_test, y_test = split(X,y)

This is the output why shape info is missing?

Resuls: enter image description here

3
  • What happens if you manually check the shape with X_test.shape? Commented Mar 23, 2021 at 14:35
  • @Darina Nothing, it returns (26, 8) Commented Mar 23, 2021 at 14:37
  • So your data frame is fine, it's just an error in Jupyter rendering. I wouldn't worry about it. Commented Mar 24, 2021 at 15:46

1 Answer 1

2

When all the rows are shown in the result cell (in your example you have only 26 rows for X_test), the shape information is not shown. By default, the maximum number of rows shown is 60 (unless you change pandas.options.display.max_rows), so if X_test has less than 60 rows, the shape information is not shown.

Try X_test.shape to see the shape.

Sign up to request clarification or add additional context in comments.

2 Comments

X_train.info gives me the columns and row values, but X_test.info does not.
In X_test.info() you have the number of rows (entries) in the second row of the message.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.