Most people striving to master machine learning start by learning regression. It is simple to understand and use. But is that it? Of course not! There is a lot more to machine learning (ML) beyond logistic regression and regression problems. For instance, have you heard of support vector regression and the support vector machines (SVM) algorithm?
Think of machine learning algorithms as an armory packed with axes, swords, blades, bows, daggers, etc. You have various tools, but you ought to learn to use them at the right time. As an analogy, think of ‘Regression’ as a sword capable of slicing and dicing data efficiently, but incapable of dealing with highly complex data. That is where ‘Support Vector Machines’ acts like a sharp knife – it works on smaller datasets, but on complex ones, it can be much stronger in building machine learning models.
In this article, we’ll explore the fundamentals of SVM in machine learning, understand the algorithm, and learn how to implement SVM in Python and R for effective data classification.
Learning Objectives
Support Vector Machine (SVM) is a supervised learning machine learning algorithm that can be used for both classification and regression challenges. However, it is mostly used in classification problems, such as text classification. In the SVM algorithm, we plot each data item as a point in n-dimensional space (where n is the number of features you have), with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the optimal hyperplane that differentiates the two classes very well.
Support Vectors are simply the coordinates of individual observations, and a hyperplane is a form of SVM visualization. The SVM classifier is a frontier that best separates the two classes (hyperplane line).
Now that we have gotten accustomed to the process of segregating the two classes with a hyperplane, the burning question is, “How can we identify the right hyperplane?”. Don’t worry, it’s not as hard as you think! Let’s understand:
Check out this article about the Machine Learning Classification Models
In the above plot, the points to consider are:
In the SVM classifier, having a linear hyperplane between these two classes is easy. But another question that arises is whether we should add this feature manually to have a hyperplane. The answer is No! The SVM algorithm has a technique called the kernel trick. The SVM kernel transforms a low-dimensional input space to a higher-dimensional space, making non-separable problems separable, useful for non-linear data separation by applying complex data transformations based on labels.
When we look at the hyperplane in the original input space, it looks like a circle:
Now, let’s look at the methods to apply the SVM classifier algorithm in a data science challenge.
You can also learn about the working of SVM in data mining video format from this Machine Learning certification course.
In an SVM, a hyperplane is a decision boundary that separates different classes of data points. For instance, in a two-dimensional space, the hyperplane is a line; in a three-dimensional space, it is a plane. The goal of the SVM is to find the optimal hyperplane that maximizes the margin between the classes. The margin is defined as the distance between the hyperplane and the nearest data points from either class.
Support vectors are the data points that are closest to the hyperplane. These points are critical because they determine the position and orientation of the hyperplane. If you remove a support vector, it can change the hyperplane’s position.
There are two types of SVMs:
Kernels are functions that take a low-dimensional input space and transform it into a higher-dimensional space. SVM can create complex decision boundaries by using kernel functions. Here are some popular kernel functions:
Used when the data is linearly separable.
Where c is a constant, and d is the degree of the polynomial. This kernel is useful for classifying data with polynomial relationships.
Where γ is a parameter that defines the influence of a single training example. This is one of the most popular kernels for non-linear data.
Where α and c are kernel parameters. It behaves like a neural network’s activation function.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features. We could
# avoid this ugly slicing by using a two-dim dataset
y = iris.target
# we create an instance of SVM and fit out data. We do not scale our
# data since we want to plot the support vectors
C = 1.0 # SVM regularization parameter
svc = svm.SVC(kernel='linear', C=1,gamma=0).fit(X, y)
# create a mesh to plot in
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
h = (x_max / x_min)/100
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
plt.subplot(1, 1, 1)
Z = svc.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.title('SVC with linear kernel')
plt.show()
Change the kernel function type to rbf
in the line below, and look at the impact.
svc = svm.SVC(kernel='rbf', C=1,gamma=0).fit(X, y)
I would suggest you go for a linear SVM kernel if you have a large number of features (>1000) because it is more likely that the data is linearly separable in high-dimensional space. Also, you can use RBF, but do not forget to cross-validate for its parameters to avoid over-fitting.
svc = svm.SVC(kernel='rbf', C=1,gamma=0).fit(X, y)
C: Penalty parameter C of the error term. It also controls the trade-off between smooth decision boundaries and classifying the training points correctly.
We should always look at the cross-validation score to effectively combine these parameters and avoid over-fitting.
Tuning the parameters’ values for machine learning algorithms effectively improves model performance. Therefore, let’s look at the list of parameters available with SVM.
sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma=0.0, coef=0.0, shrinking=True, probability=False,tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, random_state=None)
The parameters having a higher impact on model performance are “kernel,” “gamma,” and “C.”
linear
, rbf
, poly
, and others. The e1071
package in R is used to create SVM in data mining with ease. It has helper functions as well as code for the Naive Bayes Classifier. The creation of a support vector machine algorithm in R and Python follows similar approaches. Let’s take a look at the following code:
#Import Library
require(e1071) #Contains the SVM
Train <- read.csv(file.choose())
Test <- read.csv(file.choose())
# there are various options associated with SVM training; like changing kernel, gamma and C value.
# create model
model <- svm(Target~Predictor1+Predictor2+Predictor3,data=Train,kernel='linear',gamma=0.2,cost=100)
#Predict Output
preds <- predict(model,Test)
table(preds)
In R, the SVM algorithm can be tuned similarly to how they are in Python. The following are the respective parameters for the e1071 package:
Pros | Cons |
---|---|
It works well with a clear margin of separation. | It doesn’t perform well with large datasets due to high training time. |
It is effective in high-dimensional spaces. | It doesn’t perform well with noisy datasets where classes overlap. |
It is effective when the number of dimensions is greater than the number of samples. | It doesn’t directly provide probability estimates; they require expensive five-fold cross-validation. |
It uses a subset of the training set in the decision function (support vectors), making it memory efficient. |
Find the right additional feature to have a hyperplane for segregating the classes in the snapshot below:
In this article, we looked at the machine learning algorithm, Support Vector Machine. We discussed the concept of its working, the process of its implementation in Python and R, and the tricks to make the model more efficient by tuning its parameters. Towards the end, we also pointed out the pros and cons of the algorithm. I suggest you try solving the problem above to practice your SVM algorithm skills, and also try to analyze the power of this model by tuning its parameters.
Key Takeaways
A. Support vector machines (SVMs) are supervised learning models used for classification and regression tasks. For instance, they can classify emails as spam or non-spam. Additionally, they can be used to identify handwritten digits in image recognition.
A. Support” refers to the data points (support vectors) that are closest to the decision boundary. These points are critical in defining the optimal hyperplane for classification.
A. The function of SVM is to classify data by finding the optimal hyperplane that separates different classes. Consequently, it works well for both linear and non-linear classification problems by transforming data using kernel functions.
hi, gr8 articles..explaining the nuances of SVM...hope u can reproduce the same with R.....it would be gr8 help to all R junkies like me
NEW VARIABLE (Z) = SQRT(X) + SQRT (Y)
Given problem Data points looks like y=x^2+c. So i guess z=x^2-y OR z=y-x^2.
i think x coodinates must increase after sqrt
Kernel
I mean kernel will add the new feature automatically.