Basic binary classification with Keras not working

Question

I am a newbie to ML, and want to perform the simpliest classification with Keras: if y > 0.5, then label = 1 (x no matter), and y < 0.5 then label = 0 (x no matter)

As far as I understand, 1 neuron with sigmoid activation can peform this linear classification.

import tensorflow.keras as keras
import math

import numpy as np
import matplotlib as mpl

train_data = np.empty((0,2),float)
train_labels = np.empty((0,1),float)


train_data = np.append(train_data, [[0, 0]], axis=0)
train_labels = np.append(train_labels, 0)

train_data = np.append(train_data, [[1, 0]], axis=0)
train_labels = np.append(train_labels, 0)

train_data = np.append(train_data, [[0, 1]], axis=0)
train_labels = np.append(train_labels, 1)

train_data = np.append(train_data, [[1, 1]], axis=0)
train_labels = np.append(train_labels, 1)


model = keras.models.Sequential()
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Dense(1, input_dim = 2, activation='sigmoid'))

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(train_data, train_labels, epochs=20)

Training:

Epoch 1/5
4/4 [==============================] - 1s 150ms/step - loss: 0.4885 - acc: 0.7500
Epoch 2/5
4/4 [==============================] - 0s 922us/step - loss: 0.4880 - acc: 0.7500
Epoch 3/5
4/4 [==============================] - 0s 435us/step - loss: 0.4875 - acc: 0.7500
Epoch 4/5
4/4 [==============================] - 0s 396us/step - loss: 0.4869 - acc: 0.7500
Epoch 5/5
4/4 [==============================] - 0s 465us/step - loss: 0.4863 - acc: 0.7500

And predicting is not good:

predict_data = np.empty((0,2),float)
predict_data = np.append(predict_data, [[0, 0]], axis=0)
predict_data = np.append(predict_data, [[1, 0]], axis=0)
predict_data = np.append(predict_data, [[1, 1]], axis=0)
predict_data = np.append(predict_data, [[1, 1]], axis=0)

predict_labels = model.predict(predict_data)
print(predict_labels)

[[0.49750862]
 [0.51616406]
 [0.774486  ]
 [0.774486  ]]

How to solve this problem?

After all, I tried to train model on 2000 points (in my mind, it's more than enough for this simple problem), but with no sucess...

train_data = np.empty((0,2),float)
train_labels = np.empty((0,1),float)

for i in range(0, 1000):
  train_data = np.append(train_data, [[i, 0]], axis=0)
  train_labels = np.append(train_labels, 0)
  train_data = np.append(train_data, [[i, 1]], axis=0)
  train_labels = np.append(train_labels, 1)

model = keras.models.Sequential()
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Dense(1, input_dim = 2, activation='sigmoid'))

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(train_data, train_labels, epochs=5)

Epoch 1/5
2000/2000 [==============================] - 1s 505us/step - loss: 7.9669 - acc: 0.5005
Epoch 2/5
2000/2000 [==============================] - 0s 44us/step - loss: 7.9598 - acc: 0.5010
Epoch 3/5
2000/2000 [==============================] - 0s 45us/step - loss: 7.9511 - acc: 0.5010
Epoch 4/5
2000/2000 [==============================] - 0s 50us/step - loss: 7.9408 - acc: 0.5010
Epoch 5/5
2000/2000 [==============================] - 0s 53us/step - loss: 7.9279 - acc: 0.5015

<tensorflow.python.keras.callbacks.History at 0x7f4bdbdbda90>

Prediction:

predict_data = np.empty((0,2),float)
predict_data = np.append(predict_data, [[0, 0]], axis=0)
predict_data = np.append(predict_data, [[1, 0]], axis=0)
predict_data = np.append(predict_data, [[1, 1]], axis=0)
predict_data = np.append(predict_data, [[1, 1]], axis=0)

predict_labels = model.predict(predict_data)
print(predict_labels)

[[0.6280617 ]
 [0.48020774]
 [0.8395983 ]
 [0.8395983 ]]

0.6280617 for (0,0) is very bad.

You have not enough data to learn from. 4 data points is not enough for general machine learning and this is worse for neural networks / deep learning — LaSul
– LaSul, Commented Dec 18, 2018 at 12:42

sdcbr · Accepted Answer · 2018-12-18 13:32:27Z

Your problem setup is a bit weird in the sense that you only have four data points yet want to learn model weights with gradient descent (or adam). Also, the batchnorm does not really make sense here, so I would suggest to remove it.

Apart from that, your network is predicting numbers between 0 and 1 ('probabilities') and not class labels. To get the predicted class labels, you can use model.predict_classes(predict_data) instead of model.predict().

If you are new to ML and you want to experiment with toy datasets, you can also have a look at scikit-learn, which is a library that implements more traditional ML algorithms, whereas Keras is specifically for deep learning. Consider for instance logistic regression, which is the same thing as a single neuron with a sigmoid activation but is implemented with different solvers in sklearn:

from sklearn.linear_model import LogisticRegression

model  = LogisticRegression()
model = model.fit(train_data, train_labels)
model.predict(predict_data)
> array([0., 0., 1., 1.])

The scikit-learn website contains lots of examples that illustrate these different algorithms on toy datasets.

In your second scenario, you are not allowing any variation in the second feature, which is the only one that matters. If you want to train the model on 1000 datapoints, you can generate data around the four points in your original dataset and add some random noise to those:

import keras
import numpy as np
import matplotlib.pyplot as plt

# Generate toy dataset
train_data = np.random.randint(0, 2, size=(1000, 2))
# Add gaussian noise
train_data = train_data + np.random.normal(scale=2e-1, size=train_data.shape)
train_labels = (train_data[:, 1] > 0.5).astype(int)

# Visualize the data, color-coded by their classes
fig, ax = plt.subplots()
ax.scatter(train_data[:, 0], train_data[:, 1], c=train_labels)

# Train a simple neural net
model = keras.models.Sequential()
model.add(keras.layers.Dense(1, input_shape= (2,), activation='sigmoid'))
model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])

history = model.fit(train_data, train_labels, epochs=20)

You can use the history object to visualize how the loss or accuracy evolved during training:

fig, ax = plt.subplots()
ax.plot(history.history['acc'])

Finally, test the model on some test data:

from sklearn.metrics import accuracy_score
# Test on test data
test_data = np.random.randint(0, 2, size=(100, 2))
# Add gaussion noise
test_data = test_data + np.random.normal(scale=2e-1, size=test_data.shape)
test_labels = (test_data[:, 1] > 0.5).astype(int)

accuracy_score(test_labels, model.predict_classes(test_data[:, 1]))

However, be aware that you could solve the entire problem by just using the second coordinate. So it works just fine if you throw the first one away:

# Use only second coordinate
model = keras.models.Sequential()
model.add(keras.layers.Dense(1, input_shape= (1,), activation='sigmoid'))
model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])

history = model.fit(train_data[:,1], train_labels, epochs=20)

This model quickly achieves high accuracy:

Daniel T. · Accepted Answer · 2018-12-18 12:35:21Z

1

yes, first of all BatchNorm and Adam doesn't really make sense in this situation. And the reason why your predictions don't work is because your model is too weak to solve your equations. If you if try to it solve mathematically you will have:

sigmoid(w1*x1+w2+x2+b0) = y

So with your training data you get:

1) sigmoid(b0) = 0 => b0 = -infinite
2) sigmoid(w1+b0) = 0 => w1 = constant
3) sigmoid(w2+b0) = 1 => w2 >> |b0| (already starting to break...)
4) sigmoid(w1+w2+b0) = 1 => same as 3

So in my opinion the trainer will start to oscillate between 2 and 3, starting to increase each one higher than the other and you will never reach your prediction with this model

And if you look the the 75% accuracy it will make sense because you have 4 training example and as stated above one prediction will not be possible so you will get 3/4 acc

edited Dec 18, 2018 at 12:35

answered Dec 18, 2018 at 12:20

Daniel T.

4891 gold badge8 silver badges24 bronze badges

6 Comments

sdcbr Over a year ago

The classes are linearly separable in input space, so I wouldn't say that the model is too weak, but rather that the optimization problem is ill-posed.

Daniel T. Over a year ago

@ sdcbr but don't you think it may not be appropriate to apply that theory to 4 equally distant points in space?

sdcbr Over a year ago

Well the model here is actually even too complex, since we don't even need the first feature to do a correct classification. So yes, it's definitely 'inappropriate' to throw batchnorm, adam and a neural network at the problem. I think it's much more informative to work with simple, but workable toy datasets like in the sklearn examples: scikit-learn.org/stable/auto_examples/linear_model/…

prog mob Over a year ago

Updated question

sdcbr Over a year ago

Update my answer.

|

Collectives™ on Stack Overflow

Basic binary classification with Keras not working

2 Answers 2

Comments

6 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

6 Comments

Related