1. The difference between classification and clustering. from here.
Classification: supervised learning with labels.
Clustering: unsupervised learning without labels.
Classification and Clustering are the two types of learning methods which characterize objects into groups by one or more features. These processes appear to be similar, but there is a difference between them in the context of data mining. The prior difference between classification and clustering is that classification is used in supervised learning technique where predefined labels are assigned to instances by properties, on the contrary, clustering is used in unsupervised learning where similar instances are grouped, based on their features or properties.
2. The difference between k-means and k-NN. from here.
k-means: an unsupervised algorithm used for clustering.
k-NN: a supervised algorithm used for classification.
3. K-NN algorithm
K-nearest neighbours needs labelled data to train on. With the given data, KNN can classify new, unlabelled data by analysis of the k
number of the nearest data points.
Steps
- 计算测试数据与各个训练数据之间的距离;
-
按照距离的递增关系进行排序;
-
选取距离最小的K个点;
-
确定前K个点所在类别的出现频率;
-
返回前K个点中出现频率最高的类别作为测试数据的预测分类。
4. K-means algorithm
Steps
- Initially, randomly pick k centroids/cluster centers. Try to make them near the data but different from one another.
- Then assign each data point to the closest centroid.
- Move the centroids to the average location of the data points assigned to it.
- Repeat the preceding two steps until the assignments don’t change, or change very little.