2. (k-means clustering [6pt]) This exercise is related to the reweighted linear regression problem:
during lectures, we considered the clustering problem with equal weights for each sample. In this
problem, we will re-weigh each sample $\vec{x}_i$ with a weight $w_i \ge 0$ to represent the importance (signif-
icance) of each sample.
(a) [2pt] Write down the weighted k-means clustering problem. There are two variables: the
centers $c_j$, $\forall j = 1,...,k$ and the partition $S_j$. The objective function is the sum of the
weighted squared Euclidean distance from each sample to its corresponding cluster center.
Answer:
(b) [2pt] Consider the first step of the k-means heuristics algorithm. In this step, we consider $c_j$
as fixed, and we optimize over the partition variables. What is the optimal partition?
Answer:
$S_j = $
(c) [2pt] Consider the second step of the k-means heuristics algorithm. In this step, we consider $S_j$
as fixed, and we optimize over the locations of the centers $c_j$. What are the optimal centers?
Answer:
$c_j = $