交叉验证
https://sklearn.apachecn.org/docs/0.21.3/30.html
cross_val_score
>>> from sklearn.model_selection import cross_val_score
>>> clf = svm.SVC(kernel='linear', C=1)
>>> scores = cross_val_score(clf, iris.data, iris.target, cv=5)
>>> scores
array([0.96..., 1. ..., 0.96..., 0.96..., 1. ])
>>> from sklearn import metrics
>>> scores = cross_val_score(
... clf, iris.data, iris.target, cv=5, scoring='f1_macro')
>>> scores
array([0.96..., 1. ..., 0.96..., 0.96..., 1. ])
>> from sklearn.pipeline import make_pipeline
>> clf = make_pipeline(preprocessing.StandardScaler(), svm.SVC(C=1))
>> cross_val_score(clf, iris.data, iris.target, cv=5)
...
array([ 0.97..., 0.93..., 0.95...])
K 折
KFold
将所有的样例划分为 k个组,称为折叠 (fold) (如果k=n, 这等价于 Leave One Out(留一) 策略),都具有相同的大小(如果可能)。预测函数学习时使用 k-1个折叠中的数据,最后一个剩下的折叠会用于测试。
在 4 个样例的数据集上使用 2-fold 交叉验证的例子:
>>> import numpy as np
>>> from sklearn.model_selection import KFold
>>> X = ["a", "b", "c", "d"]
>>> kf = KFold(n_splits=2)
>>> for train, test in kf.split(X):
... print("%s %s" % (train, test))
[2 3] [0 1]
[0 1] [2 3]
分层 k 折
StratifiedKFold 是 k-fold 的变种,会返回 stratified(分层) 的折叠:每个小集合中, 各个类别的样例比例大致和完整数据集中相同。
在有 10 个样例的,有两个略不均衡类别的数据集上进行分层 3-fold 交叉验证的例子:
>>> from sklearn.model_selection import StratifiedKFold
>>> X = np.ones(10)
>>> y = [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
>>> skf = StratifiedKFold(n_splits=3)
>>> for train, test in skf.split(X, y):
... print("%s %s" % (train, test))
[2 3 6 7 8 9] [0 1 4 5]
[0 1 3 4 5 8 9] [2 6 7]
[0 1 2 4 5 6 7] [3 8 9]
调整超参数
https://sklearn.apachecn.org/docs/0.21.3/31.html
param_grid = [
{'C': [1, 10, 100, 1000], 'kernel': ['linear']},
{'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
]
from sklearn.metrics import fbeta_score, make_scorer
from sklearn.model_selection import GridSearchCV
from sklearn.svm import LinearSVC
ftwo_scorer = make_scorer(fbeta_score, beta=2)
grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=ftwo_scorer)
corer(fbeta_score, beta=2)
grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=ftwo_scorer)