What is Clustering?

Nischal Yadav

Published Jul 20, 2021

K-Means Clustering is an unsupervised learning algorithm that is used to solve clustering problems in machine learning or data science. In this topic, we will learn the K-means clustering algorithm, how the algorithm works, and the Python implementation of k-means clustering.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each dataset belongs to only one group that has similar properties

How does the K-Means Algorithm Work?

Select the number K to decide the number of clusters.
Select random K points or centroids. (It can be other from the input dataset).
Assign each data point to their closest centroid, which will form the predefined K clusters.
Calculate the variance and place a new centroid of each cluster.
Repeat the third steps, which means reassign each datapoint to the new closest centroid of each cluster.
If any reassignment occurs, then go to step-4 else go to FINISH.
The model is ready.

Use Case in the security domain:

Identifying crime localities:

With data related to crimes available in specific localities in a city, the category of crime, the area of the crime, and the association between the two can give quality insight into crime-prone areas within a city or a locality. here is a sample implementation of the k-means for document clustering.

Call record detail analysis

a call detail record (cdr) is the information captured by telecom companies during the call, SMS, and internet activity of a customer. this information provides greater insights about the customer’s needs when used with customer demographics.

Recommended by LinkedIn

Machine Learning Algorithms Every Data Scientist…

Thomas Cherickal 4 years ago

Oracle Autonomous Database: Machine Learning with…

Ryan Giggs 7 months ago

KD 17:n01: 5 Machine Learning Projects You Can’t…

Gregory Piatetsky-Shapiro 9 years ago

Automatic clustering of it alerts

large enterprise infrastructure technology components such as network, storage, or database generate large volumes of alert messages. because alert messages potentially point to operational issues, they must be manually screened for prioritization for downstream processes.

Delivery store optimization

optimize the process of good delivery using truck drones by using a combination of k-means to find the optimal number of launch locations and a genetic algorithm to solve the truck route as a travelling salesman problem.

Drawbacks

Kmeans algorithm is good in capturing the structure of the data if clusters have a spherical-like shape. It always tries to construct a nice spherical shape around the centroid. That means that the minute the clusters have complicated geometric shapes, k-means do a poor job in clustering the data.

Conclusion

K-means algorithm is useful for undirected knowledge discovery and is relatively simple. K-means has found widespread usage in a lot of fields, ranging from unsupervised learning of neural networks, Pattern recognitions, Classification analysis, Artificial intelligence, image processing, machine vision, and many others.

Thank You For Reading

To view or add a comment, sign in

LinkedIn respects your privacy

What is Clustering?

Nischal Yadav

How does the K-Means Algorithm Work?

Use Case in the security domain:

Call record detail analysis

Recommended by LinkedIn

Automatic clustering of it alerts

Delivery store optimization

Drawbacks

Conclusion

More articles by Nischal Yadav

Others also viewed

The Data Scientist’s Guide to Scaling: Standard, MinMax & Robust Methods

ML Engineer vs Data Scientist

Data Analytics and AI: What You Need to Learn to Stay Ahead

Looking Beyond Algorithms And Thoughtless Hiring Practices In The Analytics Space

K-Means Clustering Explained: How Machines Learn to Group Without Labels

Data Science: Unlocking Algorithms for Analytics Success

Building the GenAI Dream Team: How Data Scientists, Data Engineers, Developers, and DevOps Make Magic Happen

Machine Learning in HR - Analyzing Employee attrition using Python (Part 1)

Basics of Machine Learning

AI Atlas #7: Clustering

Explore content categories

How does the K-Means Algorithm Work?

Use Case in the security domain:

Call record detail analysis

Recommended by LinkedIn

Automatic clustering of it alerts

Delivery store optimization

Drawbacks

Conclusion

More articles by Nischal Yadav

Zenity

USE CASE OF JS ON NETFLIX

Confusion Matrix

Machine Learning Model On Docker

Others also viewed

The Data Scientist’s Guide to Scaling: Standard, MinMax & Robust Methods

ML Engineer vs Data Scientist

Data Analytics and AI: What You Need to Learn to Stay Ahead

Looking Beyond Algorithms And Thoughtless Hiring Practices In The Analytics Space

K-Means Clustering Explained: How Machines Learn to Group Without Labels

Data Science: Unlocking Algorithms for Analytics Success

Building the GenAI Dream Team: How Data Scientists, Data Engineers, Developers, and DevOps Make Magic Happen

Machine Learning in HR - Analyzing Employee attrition using Python (Part 1)

Basics of Machine Learning

AI Atlas #7: Clustering

Explore content categories