00:01
Hello students, here the main difference between the supervised and unsupervised learning is the presence of the label data.
00:06
So in the case of supervised learning, in the supervised learning, the algorithm is trained on the label data set, where each data point has an associated target or the label.
00:15
So, the goal is to learn a mapping from the input to the output based on the provided label.
00:20
A benefit of supervised algorithm that it can make accurate prediction on the new unseen data.
00:25
A drawback is that it requires a large amount of the label data for the training, which can be expensive and time consuming to obtain.
00:34
So training takes a time, but it can make the accurate prediction.
00:41
So here, according to the data, trained data we get, so here is a label data.
00:47
Unsupervised algorithm is given an unlabeled data.
00:51
So here the data is not labeled, it's unlabeled data set.
00:55
So here to find the pattern or the structure within the data, we find here the particular structure.
01:01
It aims to discover hidden pattern, group similar data point or reduce the dimensionality of the data.
01:07
A benefit of unsupervised learning is its ability to reveal underlying structure in the data without the need for the labeled example.
01:15
A drawback is that the result may be less interpretable since there is no predefined label.
01:24
And after this, the next is the difference.
01:28
Yes, different initialization for the k -means algorithm can lead to a different result.
01:35
So here the final cluster assignment and the centroid can vary depending on the initial position of the centroid.
01:42
So as we know very well in k -means algorithm, the final cluster assignment and the centroid can vary depending on the initial period of the centroid.
01:53
So here we take the centroid and we consider we make the cluster.
01:58
K -means is sensitive to the initial location of the data centroid because it can converge to the local minima of the objective function.
02:07
So to mitigate this issue, it's common practice to run k -means multiple times with different initialization and choose the best result based on the sum criteria such as minimizing the sum of the square distance.
02:19
So here we take the different different cluster on their distance.
02:24
So we can take the criteria so that the distance is minimized.
02:29
So here we can say yes, different initialization for the k -means algorithm can lead to a different result.
02:35
Now the k -means algorithm will converge in a finite number of iterations because it monotonically decreases the objective function which is the sum of the square distance between the data point and their assigned centroid.
02:50
So here some proofs.
02:51
The algorithm starts with an initial assignment of the data point to the cluster.
02:56
First we do the initialization by assigning them.
03:02
In each iteration it updates the cluster centroid to minimize the sum of the square distance.
03:07
So here we have to we have to this square distance to minimize.
03:13
So this is the distance from the centroid.
03:17
This decreases the value of the objective function or keep it the same.
03:20
So in this way we reduce the objective function.
03:24
Since there is a finite number of the data points and a finite number of the possible cluster assignments, so there are only a finite number of the way to assign the data point to the cluster...