Member-only story
Implementation of K-means++ — Know the smarter brother of K-means
K-means clustering algorithm is one of the well-known algorithms among ML enthusiasts and also one of the first algorithms that are taught in machine learning classes in college.

To give you a recap of what K-means clustering means .. it is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
So what’s wrong with our beloved K-means algorithm?
This algorithm is sensitive to the initialization of the centroids or the mean points. For instance , if a centroid is initialized to be a far-off point, it might just end up with no points associated with it, and at the same time, more than one cluster might end up linked with a single centroid.
Here comes the smarter version of K-means! K-means++
K-means++ is exactly the same algorithm apart from one thing! This algorithm ensures a smarter initialization of the centroids and improves the quality of the clustering.
Implementation of K-means ++ Algorithm
# importing dependenciesimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport sys# creating data
mean_01 = np.array([0.0, 0.0])
cov_01 = np.array([[1, 0.3], [0.3, 1]])
dist_01 = np.random.multivariate_normal(mean_01, cov_01, 100)mean_02 = np.array([6.0, 7.0])
cov_02 = np.array([[1.5, 0.3], [0.3, 1]])
dist_02 = np.random.multivariate_normal(mean_02, cov_02, 100)mean_03 = np.array([7.0, -5.0])
cov_03 = np.array([[1.2, 0.5], [0.5, 1,3]])
dist_03 = np.random.multivariate_normal(mean_03, cov_01, 100)mean_04 = np.array([2.0, -7.0])
cov_04 = np.array([[1.2, 0.5], [0.5, 1,3]])
dist_04 = np.random.multivariate_normal(mean_04, cov_01, 100)data = np.vstack((dist_01, dist_02, dist_03, dist_04))
np.random.shuffle(data)