一:定义
超参数是在开始学习过程之前设置值的参数,而不是通过训练得到的参数数据。
二:常用超参数
k近邻算法的k,权重weight,明可夫斯基距离公式的p,这三个参数都在KNeighborsClassifier类的构造函数中。
三:共同代码
import numpy as np from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split from sklearn import datasets digits = datasets.load_digits() x = digits.data y = digits.target x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2)
四:k的最优数值
best_score = 0.0 best_k = -1 for k in range(1,11): knn = KNeighborsClassifier(n_neighbors=k) knn.fit(x_train,y_train) t = knn.score(x_test,y_test) if t>best_score: best_score = t best_k = k print(best_k) print(best_score)
五:weight的最优数值
如果取值为uniform,例如:当我们取k等于3,结果预测到三个点距离最近的点为三个,sklearn就会选择一个进行返回预测结果,但是我们如果考虑距离也就是取值为distance,就会有一个权重的概念,一般为距离的倒数,例如该点到另外三个点的距离为1,3,4则权重为1,1/3,1/4,则返回1这个点作为预测结果。
best_score = 0.0 best_k = -1 best_method = '' for method in ['uniform','distance']: for k in range(1,11): knn = KNeighborsClassifier(n_neighbors=k,weights=method) knn.fit(x_train,y_train) t = knn.score(x_test,y_test) if t>best_score: best_score = t best_k = k best_method = method print(best_score) print(best_k) print(best_method)
六:p的最优数值
当需要p的参数时,weight必须为distance,不能为uniform
best_score = 0.0 best_k = -1 best_p = 1 for i in range(1,6): for k in range(1,11): knn = KNeighborsClassifier(n_neighbors=k,weights='distance',p=i) knn.fit(x_train,y_train) t = knn.score(x_test,y_test) if t>best_score: best_k = k best_score = t best_p = i print(best_p) print(best_score) print(best_k)