Python Programming
Task description: In machine learning, clustering is used for analyzing and grouping data which does not include pre-labeled class or even class attribute at all. K-Means clustering and hierarchical clustering are all unsupervised learning algorithms.
K-means is a collection of objects which are similar between them and are dissimilar to the objects belonging to other clusters. It is a division of objects into clusters such that each object is in exactly one cluster, not several.
In Hierarchical clustering, clusters have a tree-like structure or parent-child relationship. Here, the two most similar clusters are combined together and continue to combine until all objects are in the same cluster.
In this task, you use K-Means and Agglomerative Hierarchical algorithms to cluster a synthetic dataset and compare their differences.
You are given:
- np.random.seed(0)
- make_blobs class with input: samples=200, centers=[[3,2], [6, 4], [10, 5]], cluster_std=0.9
- KMeans() function with settings: init="k-means++", n_clusters=3, n_init=12
- AgglomerativeClustering() function with settings: n_clusters=3, linkage="average"
- Other settings of your choice
You are asked to:
- Plot your created dataset
- Plot the two clustering models for your created dataset
- Set the K-Means plot with title "K-Means"
- Set the Agglomerative Hierarchical plot with title "Agglomerative Hierarchical"
- Calculate the distance matrix for Agglomerative Clustering using the input feature matrix (linkage="complete")
- Display dendrogram