Confusion Matrix: Evaluation Measures for Classification Problems

Angad Gupta ,MIEEE, BITS-Pilani

Published Jun 16, 2020

In data mining, classification involves the problem of predicting which category or class a new observation belongs in. The derived model (classifier) is based on the analysis of a set of training data where each data is given a class label. The trained model (classifier) is then used to predict the class label for new, unseen data. To understand classification metrics, one of the most important concepts is the confusion matrix.

Confusion matrix :

A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. This gives us a holistic view of how well our classification model is performing and what kinds of errors it is making.

The target variable has two values: Positive or Negative
The columns represent the actual values of the target variable
The rows represent the predicted values of the target variable

Interpretation: Each cell has 2 values, one either TRUE OR FALSE and SECOND POSITIVE or NEGATIVE. Let's see how to remember easily..

For TRUE or FALSE, just remember AND logic 1 & 1 --> 1 (True) and 0&0 --> 1 (True) remaining all are 0 (False). first row & first colum 1 & 1 that is 1 (true), first row and second column 1 & 0 that is 0 (False) like wise second row first colum 0 & 1 , 0 (false) and second row & second colum , 0 & 0 tthat is 0 (TRUE).

For the Second value positive and negative has to be selected based on the row label, so for the first row it's positive and for second-row it's negative.

Likewise, we can easily remember TP, FP, FN & TN

Understanding True Positive, True Negative, False Positive and False Negative in a Confusion Matrix

True Positive (TP)

The predicted value matches the actual value
The actual value was positive and the model predicted a positive value

True Negative (TN)

The predicted value matches the actual value
The actual value was negative and the model predicted a negative value

False Positive (FP) – Type 1 error

The predicted value was falsely predicted
The actual value was negative but the model predicted a positive value
Also known as the Type 1 error

False Negative (FN) – Type 2 error

The predicted value was falsely predicted
The actual value was positive but the model predicted a negative value
Also known as the Type 2 error

Example for a better understanding of TP, TN , FP & FN

True Positive (TP) = 560; meaning 560 positive class data points were correctly classified by the model
True Negative (TN) = 330; meaning 330 negative class data points were correctly classified by the model
False Positive (FP) = 60; meaning 60 negative class data points were incorrectly classified as belonging to the positive class by the model
False Negative (FN) = 50; meaning 50 positive class data points were incorrectly classified as belonging to the negative class by the model

Mathematical Interpretation of confusion matrix

import numpy as np
import sklearn.datasets
import sklearn.linear_model
import sklearn.metrics
from sklearn.model_selection import train_test_split

# do not change for reproducibility
np.random.seed(42) 

# Importing the dataset
dataset = sklearn.datasets.fetch_covtype()

# only use a random subset for speed - pretend the rest of the data doesn't exist
random_sample = np.random.choice(len(dataset.data), len(dataset.data) // 10)

# We are only intersted in Class 3 forest type.
COVER_TYPE = 3
features = dataset.data[random_sample, :]
target = dataset.target[random_sample] == COVER_TYPE

# Doing the 80-20% train test split of the data
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size = 0.2)

# Building the basic Logistic Regression
classifier = sklearn.linear_model.LogisticRegression(solver='liblinear')
classifier.fit(X_train,  y_train)
predictions = classifier.predict(X_test)

# Printing out Confusion matrix for our predictions
from sklearn.metrics import confusion_matrix
print(confusion_matrix(y_test, predictions))

On execution, you will see the confusion matrix for our predictions.

[[10766 169

[ 235 451]]

1. Accuracy

The accuracy of a classifier is given as the percentage of total correct predictions divided by the total number of instances. Mathematically,

If the accuracy of the classifier is considered acceptable, the classifier can be used to classify future data tuples for which the class label is not known. But is it the case with us? Let's see if accuracy is the right evaluation metric for this problem.

Accuracy will be reliable when we have somewhat equal proportions of data (50-50 of true and false class labels) and always unreliable if the data set is unbalanced. Of most of the data mining problems, accuracy is the least-used metric because it does not give correct information on predictions.

2. Recall

Recall is one of the most used evaluation metrics for an unbalanced dataset. It calculates how many of the actual positives our model predicted as positives (True Positive).

Recall is also known as true positive rate (TPR), sensitivity, or probability of detection.

Mathematically,

In the confusion matrix:

Recall = 451/(451+235) =65.74%

3. Precision

Precision describes how accurate or precise our data mining model is. Out of those cases predicted positive, how many of them are actually positive.?

Precision is also called a measure of exactness or quality, or positive predictive value.

Mathematically,

In the confusion matrix,:

Precision: 451/(451+169) = 72.74

4. F1 Score

When both recall and precision are necessary, then the F1 score comes into the picture. It tries to balance out both recall and precision. Remember, it is still better than accuracy, as with an F1 score we are not looking for any true negative data.

Mathematically, it is defined as a harmonic mean of recall and precision:

F1 Score = 2 x 72.74 x 65.74/(72.74+65.74) = 69.07

The F score reaches the best value, meaning perfect precision and recall, at a value of 1. The worst F score, which means the lowest precision and lowest recall, would be a value of 0.

5. ROC Curve

Sometimes it's not easy to find out which evaluation metric to use, and visualizing with different thresholds can help us select the best evaluation metric.

Receiver Operating Characteristics curves, or ROC curves, are graphs that show the performance of a classification model at all classification thresholds. An ROC curve is a useful visual tool for comparing two classification models. ROC depicts the performance trade-off between the true positive rate (TPR) and false positive rate (FPR) of a classification model.

Mathematically,

When we lower the threshold of a classifier, it classifies more items as positive, thus increasing both false positives and true positives.

ROC is one of the most popular plots, which helps in the interpretation of a classifier.

6. Specificity

7. Summary

Precision is how certain you are of your true positives. Recall is how certain you are that you are not missing any positives.
Choose Recall if the occurrence of false negatives is unaccepted/intolerable. For example, in the case of diabetes that you would rather have some extra false positives (false alarms) over saving some false negatives.
Choose Precision if you want to be more confident of your true positives. For example, in case of spam emails, you would rather have some spam emails in your inbox rather than some regular emails in your spam box. You would like to be extra sure that email X is spam before we put it in the spam box.
Choose Specificity if you want to cover all true negatives, i.e. meaning we do not want any false alarms or false positives. For example, in case of a drug test in which all people who test positive will immediately go to jail, you would not want anyone drug-free going to jail.

#datascience #machinelearning #regression #multipleregression #MLR #python #statistics #statemodel #modeling #model interpretation #MLR #linearregression #learning #ml #datascience #datamodeloing #dataevalution #datavisualization #gupta #clusttering #k-means #unsupervisiedlearning #iris #learning #clusteringexample #slearn #userinterface #GUI #thinkar #evalutionmeasures #model #confusionmatrix #Classification #predicted #actual #positive #negative #TP #TN #FP #FP #recall #sensitivity #precision #accuracy #FTEST #ROCcurve #specificity

5 Comments

Sunil Kumar Yadav 5y

Thanks for posting

1 Reaction

Suketu Dave 5y

Very well explained!!! Easy to understand. And by the way, the summary is written using a very practical and concise approach. Thanks for the post. Keep publishing :)

LinkedIn respects your privacy

Confusion Matrix: Evaluation Measures for Classification Problems

Angad Gupta ,MIEEE, BITS-Pilani

Confusion matrix :

Understanding True Positive, True Negative, False Positive and False Negative in a Confusion Matrix

Mathematical Interpretation of confusion matrix

1. Accuracy

2. Recall

3. Precision

4. F1 Score

5. ROC Curve

6. Specificity

7. Summary

More articles by Angad Gupta ,MIEEE, BITS-Pilani

Others also viewed

Data - Mining; An example using Random Forest on Prediction of Biological Properties of Molecules from Chemical Structure.

How can AI 'Knowledge Mining' accelerate your business?

Data Mining As an IT Career

The Art and Science of Data Mining in the Aviation Industry

Apriori Algorithm In Data Mining With Example

Understanding Data Mining

Data Mining Implementation Process

What I learned this week - Data mining

Why Process Mining improves AI outcomes!

Explore content categories

Confusion matrix :

Understanding True Positive, True Negative, False Positive and False Negative in a Confusion Matrix

Mathematical Interpretation of confusion matrix

1. Accuracy

2. Recall

3. Precision

4. F1 Score

5. ROC Curve

6. Specificity

7. Summary

More articles by Angad Gupta ,MIEEE, BITS-Pilani

TYPES OF ELECTRIC VEHICLES AND ITS KEY…

eRoaming : a Revolutionary step in EV…

EV Roaming and Its different protocols…

Open Charge Point Protocol (OCPP) vs…

Interoperability in EV charging…

Relationship between SOH (State of…

Battery states: State of charge (SoC)…

V2X and Its Stakeholders

Your Electric Car is Your Power House…

Bidirectional Charging EVs: V2X [V2G…

Others also viewed

Data - Mining; An example using Random Forest on Prediction of Biological Properties of Molecules from Chemical Structure.

How can AI 'Knowledge Mining' accelerate your business?

Data Mining As an IT Career

The Art and Science of Data Mining in the Aviation Industry

Apriori Algorithm In Data Mining With Example

Understanding Data Mining

Data Mining Implementation Process

What I learned this week - Data mining

Why Process Mining improves AI outcomes!

Explore content categories