How to Use Support Vector Machines (SVM) in Python and R

Sunil Ray Last Updated : 16 Jun, 2025

10 min read

Most people striving to master machine learning start by learning regression. It is simple to understand and use. But is that it? Of course not! There is a lot more to machine learning (ML) beyond logistic regression and regression problems. For instance, have you heard of support vector regression and the support vector machines (SVM) algorithm?

Think of machine learning algorithms as an armory packed with axes, swords, blades, bows, daggers, etc. You have various tools, but you ought to learn to use them at the right time. As an analogy, think of ‘Regression’ as a sword capable of slicing and dicing data efficiently, but incapable of dealing with highly complex data. That is where ‘Support Vector Machines’ acts like a sharp knife – it works on smaller datasets, but on complex ones, it can be much stronger in building machine learning models.

In this article, we’ll explore the fundamentals of SVM in machine learning, understand the algorithm, and learn how to implement SVM in Python and R for effective data classification.

Learning Objectives

Understand the support vector machine algorithm (SVM), a popular machine learning algorithm for classification.
Learn to implement SVM models in R and Python.
Know the pros and cons of Support Vector Machines (SVM).

What is a Support Vector Machine (SVM)?

Support Vector Machine (SVM) is a supervised learning machine learning algorithm that can be used for both classification and regression challenges. However, it is mostly used in classification problems, such as text classification. In the SVM algorithm, we plot each data item as a point in n-dimensional space (where n is the number of features you have), with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the optimal hyperplane that differentiates the two classes very well.

classification of support vectors on hyper plane | svm

Support Vectors are simply the coordinates of individual observations, and a hyperplane is a form of SVM visualization. The SVM classifier is a frontier that best separates the two classes (hyperplane line).

How does an SVM Work?

Now that we have gotten accustomed to the process of segregating the two classes with a hyperplane, the burning question is, “How can we identify the right hyperplane?”. Don’t worry, it’s not as hard as you think! Let’s understand:

Identify the Right Hyperplane

Here, we have three hyperplanes (A, B, and C). Now, identify the right hyperplane to classify stars and circles.
You need to remember a thumb rule to identify the right hyperplane: “Select the hyperplane that segregates the two classes better.” In this scenario, hyperplane “B” has excellently performed this job.

Another Example of Identifying the Right Hyperplane

Here, we have three hyperplanes (A, B, and C), and all segregate the classes well. Now, how can we identify the right hyperplane?

Here, maximizing the distances between the nearest data point (either class) and the hyperplane will help us to decide the right hyperplane. This distance is called a Margin. Let’s look at the snapshot below:

You can see that the margin for hyperplane C is higher than that of A and B. Hence, we name the right hyperplane as C. Another reason for selecting the hyperplane with a higher margin is robustness. If we select a hyperplane having a low margin, then there is a high chance of misclassification.

Another Example of Identifying the Right Hyperplane

Hint: Use the rules as discussed in the previous section to identify the right hyperplane.

Some of you may have selected hyperplane B as it has a higher margin compared to A. But, here is the catch, SVM selects the hyperplane that classifies the classes accurately prior to maximizing the margin. Here, hyperplane B has a classification error, and A has classified all correctly. Therefore, the right hyperplane is A.

Check out this article about the Machine Learning Classification Models

Can we classify into two classes?

In the following illustration, I am unable to segregate the two classes using a straight line, as one of the stars lies in the territory of the other (circle) class as an outlier.

As I have mentioned, one star at the other end is like an outlier for the star class. The SVM algorithm has a feature to ignore outliers and find the hyperplane that has the maximum margin. Hence, we can say SVM classification is robust to outliers.

Find the Hyperplane to Segregate into Classes

In the following scenario, we can’t have a linear hyperplane between the two classes, so how does SVM classify these two classes? Till now, we have only looked at the linear hyperplane.

SVM can solve this problem. Easily! Specifically, it solves this problem by introducing additional features. Here, we will add a new feature, (z = x^2 + y^2). Now, let’s plot the data points on the x and z axes:

In the above plot, the points to consider are:

All values for z would always be positive because z is the squared sum of both x and y.
In the original plot, red circles appear close to the origin of the x and y axes, leading to a lower value of z. The star is relatively far away from the original results due to the higher value of z.

In the SVM classifier, having a linear hyperplane between these two classes is easy. But another question that arises is whether we should add this feature manually to have a hyperplane. The answer is No! The SVM algorithm has a technique called the kernel trick. The SVM kernel transforms a low-dimensional input space to a higher-dimensional space, making non-separable problems separable, useful for non-linear data separation by applying complex data transformations based on labels.

When we look at the hyperplane in the original input space, it looks like a circle:

Now, let’s look at the methods to apply the SVM classifier algorithm in a data science challenge.

You can also learn about the working of SVM in data mining video format from this Machine Learning certification course.

Hyperplane and Support Vectors in SVM

Hyperplane

In an SVM, a hyperplane is a decision boundary that separates different classes of data points. For instance, in a two-dimensional space, the hyperplane is a line; in a three-dimensional space, it is a plane. The goal of the SVM is to find the optimal hyperplane that maximizes the margin between the classes. The margin is defined as the distance between the hyperplane and the nearest data points from either class.

Support Vectors

Support vectors are the data points that are closest to the hyperplane. These points are critical because they determine the position and orientation of the hyperplane. If you remove a support vector, it can change the hyperplane’s position.

Types of SVMs

There are two types of SVMs:

Linear SVM is used when the data is linearly separable, which means that the classes can be separated with a straight line (in 2D) or a flat plane (in 3D). The linear SVM algorithm finds the hyperplane that best divides the data into classes.
Non-linear SVM is used when the data is not linearly separable. In such cases, SVM employs kernel functions to transform the data into a higher-dimensional space where a linear separation is possible. The algorithm then finds the optimal hyperplane in this new space.

Popular Kernel Functions in SVM

Kernels are functions that take a low-dimensional input space and transform it into a higher-dimensional space. SVM can create complex decision boundaries by using kernel functions. Here are some popular kernel functions:

Linear Kernel

Used when the data is linearly separable.

Polynomial Kernel

Where c is a constant, and d is the degree of the polynomial. This kernel is useful for classifying data with polynomial relationships.

Radial Basis Function (RBF) Kernel or Gaussian Kernel

Where γ is a parameter that defines the influence of a single training example. This is one of the most popular kernels for non-linear data.

Sigmoid Kernel

Where α and c are kernel parameters. It behaves like a neural network’s activation function.

Implementation of SVM in Python and R

Support Vector Machine (SVM) Code in Python

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets

# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features. We could
# avoid this ugly slicing by using a two-dim dataset
y = iris.target

# we create an instance of SVM and fit out data. We do not scale our
# data since we want to plot the support vectors
C = 1.0 # SVM regularization parameter
svc = svm.SVC(kernel='linear', C=1,gamma=0).fit(X, y)

# create a mesh to plot in
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
h = (x_max / x_min)/100
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
 np.arange(y_min, y_max, h))

plt.subplot(1, 1, 1)
Z = svc.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)

plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.title('SVC with linear kernel')
plt.show()

Use an SVM rbf kernel

Change the kernel function type to rbf in the line below, and look at the impact.

svc = svm.SVC(kernel='rbf', C=1,gamma=0).fit(X, y)

I would suggest you go for a linear SVM kernel if you have a large number of features (>1000) because it is more likely that the data is linearly separable in high-dimensional space. Also, you can use RBF, but do not forget to cross-validate for its parameters to avoid over-fitting.

Let’s differentiate if we have different gamma values like 0, 10, or 100.

svc = svm.SVC(kernel='rbf', C=1,gamma=0).fit(X, y)

C: Penalty parameter C of the error term. It also controls the trade-off between smooth decision boundaries and classifying the training points correctly.

We should always look at the cross-validation score to effectively combine these parameters and avoid over-fitting.

How to Tune the Parameters of SVM?

Tuning the parameters’ values for machine learning algorithms effectively improves model performance. Therefore, let’s look at the list of parameters available with SVM.

sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma=0.0, coef=0.0, shrinking=True, probability=False,tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, random_state=None)

The parameters having a higher impact on model performance are “kernel,” “gamma,” and “C.”

kernel: It is a function that transforms the input data into a higher-dimensional space to make it easier to classify with a linear separator (a hyperplane). We have various options available with the kernel, like linear, rbf, poly, and others.
gamma: Kernel coefficient for ‘rbf’, ‘poly’, and ‘sigmoid.’ The higher value of gamma will try to fit them exactly as per the training data set, i.e., generalization error, and cause an over-fitting problem.
C: Penalty parameter C of the error term. It also controls the trade-off between smooth decision boundaries and classifying the training points correctly.

Support Vector Machine (SVM) Code in R

The e1071 package in R is used to create SVM in data mining with ease. It has helper functions as well as code for the Naive Bayes Classifier. The creation of a support vector machine algorithm in R and Python follows similar approaches. Let’s take a look at the following code:

#Import Library
require(e1071) #Contains the SVM 
Train <- read.csv(file.choose())
Test <- read.csv(file.choose())
# there are various options associated with SVM training; like changing kernel, gamma and C value.

# create model
model <- svm(Target~Predictor1+Predictor2+Predictor3,data=Train,kernel='linear',gamma=0.2,cost=100)

#Predict Output
preds <- predict(model,Test)
table(preds)

In R, the SVM algorithm can be tuned similarly to how they are in Python. The following are the respective parameters for the e1071 package:

The kernel parameter can be tuned to take “Linear”, ”Poly”, ”rbf”, etc.
The gamma value can be tuned by setting the “Gamma” parameter.
The C value in Python is tuned by the “Cost” parameter in R.

Pros and Cons of SVM

Pros	Cons
It works well with a clear margin of separation.	It doesn’t perform well with large datasets due to high training time.
It is effective in high-dimensional spaces.	It doesn’t perform well with noisy datasets where classes overlap.
It is effective when the number of dimensions is greater than the number of samples.	It doesn’t directly provide probability estimates; they require expensive five-fold cross-validation.
It uses a subset of the training set in the decision function (support vectors), making it memory efficient.

SVM Practice Problem

Find the right additional feature to have a hyperplane for segregating the classes in the snapshot below:

Conclusion

In this article, we looked at the machine learning algorithm, Support Vector Machine. We discussed the concept of its working, the process of its implementation in Python and R, and the tricks to make the model more efficient by tuning its parameters. Towards the end, we also pointed out the pros and cons of the algorithm. I suggest you try solving the problem above to practice your SVM algorithm skills, and also try to analyze the power of this model by tuning its parameters.

Key Takeaways

Support Vector Machine in data mining strongly and powerfully builds machine learning models with small data sets.
You can effectively improve your model’s performance by tuning the SVM hyperparameters in Python.
The algorithm works best when there are more dimensions than samples, and I do not recommend using it for noisy, large, or complex data sets.

Frequently Asked Questions

Q1. What are support vector machines with examples?

A. Support vector machines (SVMs) are supervised learning models used for classification and regression tasks. For instance, they can classify emails as spam or non-spam. Additionally, they can be used to identify handwritten digits in image recognition.

Q2. What is support in Support Vector Machine?

A. Support” refers to the data points (support vectors) that are closest to the decision boundary. These points are critical in defining the optimal hyperplane for classification.

Q3. What is the function of SVM?

A. The function of SVM is to classify data by finding the optimal hyperplane that separates different classes. Consequently, it works well for both linear and non-linear classification problems by transforming data using kernel functions.

Sunil Ray

Sunil Ray is Chief Content Officer at Analytics Vidhya, India's largest Analytics community. I am deeply passionate about understanding and explaining concepts from first principles. In my current role, I am responsible for creating top notch content for Analytics Vidhya including its courses, conferences, blogs and Competitions.

I thrive in fast paced environment and love building and scaling products which unleash huge value for customers using data and technology. Over the last 6 years, I have built the content team and created multiple data products at Analytics Vidhya.

Prior to Analytics Vidhya, I have 7+ years of experience working with several insurance companies like Max Life, Max Bupa, Birla Sun Life & Aviva Life Insurance in different data roles.

Industry exposure: Insurance, and EdTech

Major capabilities: Content Development, Product Management, Analytics, Growth Strategy.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

nishant

hi, gr8 articles..explaining the nuances of SVM...hope u can reproduce the same with R.....it would be gr8 help to all R junkies like me

ASHISH

NEW VARIABLE (Z) = SQRT(X) + SQRT (Y)

Show 2 reply

Rishabh

Given problem Data points looks like y=x^2+c. So i guess z=x^2-y OR z=y-x^2.

dam van tai

i think x coodinates must increase after sqrt

Mahmood A. Sheikh

Kernel

Show 1 reply

I mean kernel will add the new feature automatically.

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

How to Use Support Vector Machines (SVM) in Python and R

Table of contents

What is a Support Vector Machine (SVM)?

How does an SVM Work?

Identify the Right Hyperplane

Another Example of Identifying the Right Hyperplane

Another Example of Identifying the Right Hyperplane

Can we classify into two classes?

Find the Hyperplane to Segregate into Classes

Hyperplane and Support Vectors in SVM

Hyperplane

Support Vectors

Types of SVMs

Popular Kernel Functions in SVM

Linear Kernel

Polynomial Kernel

Radial Basis Function (RBF) Kernel or Gaussian Kernel

Sigmoid Kernel

Implementation of SVM in Python and R

Support Vector Machine (SVM) Code in Python

Use an SVM rbf kernel

Let’s differentiate if we have different gamma values like 0, 10, or 100.

How to Tune the Parameters of SVM?

Support Vector Machine (SVM) Code in R

Pros and Cons of SVM

SVM Practice Problem

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs