Member-only story
Implementation of K-means++ — Know the smarter brother of K-means
K-means clustering algorithm is one of the well-known algorithms among ML enthusiasts and also one of the first algorithms that are taught in machine learning classes in college.
To give you a recap of what K-means clustering means .. it is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
So what’s wrong with our beloved K-means algorithm?
This algorithm is sensitive to the initialization of the centroids or the mean points. For instance , if a centroid is initialized to be a far-off point, it might just end up with no points associated with it, and at the same time, more than one cluster might end up linked with a single centroid.
Here comes the smarter version of K-means! K-means++
K-means++ is exactly the same algorithm apart from one thing! This algorithm ensures a smarter initialization of the centroids and improves the quality of the clustering.
Implementation of K-means ++ Algorithm
# importing…