What is Clustering?

What is Clustering?

K-Means Clustering is an unsupervised learning algorithm that is used to solve clustering problems in machine learning or data science. In this topic, we will learn the K-means clustering algorithm, how the algorithm works, and the Python implementation of k-means clustering.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each dataset belongs to only one group that has similar properties

No alt text provided for this image

How does the K-Means Algorithm Work?

  • Select the number K to decide the number of clusters.
  • Select random K points or centroids. (It can be other from the input dataset).
  • Assign each data point to their closest centroid, which will form the predefined K clusters.
  • Calculate the variance and place a new centroid of each cluster.
  • Repeat the third steps, which means reassign each datapoint to the new closest centroid of each cluster.
  •  If any reassignment occurs, then go to step-4 else go to FINISH.
  • The model is ready.

Use Case in the security domain:

Identifying crime localities:

With data related to crimes available in specific localities in a city, the category of crime, the area of the crime, and the association between the two can give quality insight into crime-prone areas within a city or a locality. here is a sample implementation of the k-means for document clustering.

Call record detail analysis

  • a call detail record (cdr) is the information captured by telecom companies during the call, SMS, and internet activity of a customer. this information provides greater insights about the customer’s needs when used with customer demographics. 

Automatic clustering of it alerts

  • large enterprise infrastructure technology components such as network, storage, or database generate large volumes of alert messages. because alert messages potentially point to operational issues, they must be manually screened for prioritization for downstream processes. 

 Delivery store optimization

  • optimize the process of good delivery using truck drones by using a combination of k-means to find the optimal number of launch locations and a genetic algorithm to solve the truck route as a travelling salesman problem.

Drawbacks

Kmeans algorithm is good in capturing the structure of the data if clusters have a spherical-like shape. It always tries to construct a nice spherical shape around the centroid. That means that the minute the clusters have complicated geometric shapes, k-means do a poor job in clustering the data.

Conclusion

K-means algorithm is useful for undirected knowledge discovery and is relatively simple. K-means has found widespread usage in a lot of fields, ranging from unsupervised learning of neural networks, Pattern recognitions, Classification analysis, Artificial intelligence, image processing, machine vision, and many others.



Thank You For Reading


To view or add a comment, sign in

More articles by Nischal Yadav

Others also viewed

Explore content categories