(8 pts) P5-1: We learned hierarchical clustering algorithms. Please list four measuring strategies and their strengths and limitations. - means & its variants -
Added by Dana S.
Close
Step 1
Hierarchical clustering is a method of grouping data points into a hierarchy of clusters. Measuring strategies assess the quality of the resulting clusters. Show more…
Show all steps
Your feedback will help us improve your experience
Aishwarya Krishnakumar and 71 other AP CS educators are ready to help you.
Ask a new question
Labs
Want to see this concept in action?
Explore this concept interactively to see how it behaves as you change inputs.
Key Concepts
Recommended Videos
Please answer the multiple-choice questions below: 1 . When may cosine distance be a good choice to measure differences between observations? Group of answer choices When dealing with observations with variables with both quantitative and categorical variables. When similar patterns across the variables of two observations is more relevant to the application than the similarity in terms of the magnitude of values. When dealing with observations consisting of binary variables. When dealing with observations consisting of ordinal variables. 2. If one wants to attempt to keep the largest difference between two observations in a cluster as small as possible, which linkage method may be most appropriate? Group of answer choices Ward's linkage. Single linkage. Group average linkage. Complete linkage. 3. Why is it recommended to implement the k-means clustering algorithm with multiple starts? Group of answer choices The Euclidean distance measure commonly used by k-means clustering is inappropriate when there are non-globular clusters. Multiple starts are necessary because k-means clustering tends to result in globular clusters. The location of the initial k randomly selected centroids can have an impact on the final clusters. The algorithm requires several iterations of assigning observations to centroids and recomputing cluster centroids to obtain the final clusters. 4. Which of the following is true statement about the comparison of Euclidean distance versus Manhattan distance? Group of answer choices Manhattan distance scales better to higher dimensions. Euclidean distance is more applicable to binary variables. Manhattan distance is distorted less by outlier observations. Euclidean distance is less expensive to compute. 5. What is a recommended way to determine the number of clusters in a k-means approach? Group of answer choices All of these. Silhouette score. Cluster interpretability. Cluster stability.
Aishwarya K.
Consider the following data points on the X-Y plane and implement the complete linkage hierarchical clustering over the following data points. Data points = {(1,1),(2,1),(4,1),(2,6),(3,3),(4,4),(5,3)} The complete linkage distance from cluster A and B can be calculated using the below formula: d(A,B) = max δ(x,x') where δ should be taken as Manhattan distance (e.g., δ((1,2), (3,1)) = |1-3| + |2-1| = 3). (Note: Show your calculation at each iteration to get the full credit.) 1. (10 pts) Implement the complete linkage hierarchical clustering over the above data points. (Show the steps!) 2. (5 pts) Draw the dendrogram after obtaining the clusters from complete linkage-based hierarchical clustering. 3. (5 pts) Can hierarchical clustering help in detecting outliers? Why or how? 4. (5 pts) In general, can we prune the dendrogram? How?
Akash M.
Compare/contrast measures of similarities among clusters.
Sri K.
Recommended Textbooks
Computer Science and Information Technology
Introduction to Programming Using Python
Computer Science - An Overview
Transcript
18,000,000+
Students on Numerade
Trusted by students at 8,000+ universities
Watch the video solution with this free unlock.
EMAIL
PASSWORD