交叉验证 网格搜索笔记

交叉验证

https://sklearn.apachecn.org/docs/0.21.3/30.html

cross_val_score

>>> from sklearn.model_selection import cross_val_score
>>> clf = svm.SVC(kernel='linear', C=1)
>>> scores = cross_val_score(clf, iris.data, iris.target, cv=5)
>>> scores                                              
array([0.96..., 1.  ..., 0.96..., 0.96..., 1.        ])
>>> from sklearn import metrics
>>> scores = cross_val_score(
...     clf, iris.data, iris.target, cv=5, scoring='f1_macro')
>>> scores                                              
array([0.96..., 1.  ..., 0.96..., 0.96..., 1.        ])
>> from sklearn.pipeline import make_pipeline
>> clf = make_pipeline(preprocessing.StandardScaler(), svm.SVC(C=1))
>> cross_val_score(clf, iris.data, iris.target, cv=5)
  ...                                                 
  array([ 0.97...,  0.93...,  0.95...])

K 折

KFold 将所有的样例划分为 k个组,称为折叠 (fold) (如果k=n, 这等价于 Leave One Out(留一) 策略),都具有相同的大小(如果可能)。预测函数学习时使用 k-1个折叠中的数据,最后一个剩下的折叠会用于测试。

在 4 个样例的数据集上使用 2-fold 交叉验证的例子:

>>> import numpy as np
>>> from sklearn.model_selection import KFold

>>> X = ["a", "b", "c", "d"]
>>> kf = KFold(n_splits=2)
>>> for train, test in kf.split(X):
...     print("%s  %s" % (train, test))
[2 3] [0 1]
[0 1] [2 3]

交叉验证 网格搜索笔记

分层 k 折

StratifiedKFold 是 k-fold 的变种,会返回 stratified(分层) 的折叠:每个小集合中, 各个类别的样例比例大致和完整数据集中相同。

在有 10 个样例的,有两个略不均衡类别的数据集上进行分层 3-fold 交叉验证的例子:

>>> from sklearn.model_selection import StratifiedKFold

>>> X = np.ones(10)
>>> y = [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
>>> skf = StratifiedKFold(n_splits=3)
>>> for train, test in skf.split(X, y):
...     print("%s  %s" % (train, test))
[2 3 6 7 8 9] [0 1 4 5]
[0 1 3 4 5 8 9] [2 6 7]
[0 1 2 4 5 6 7] [3 8 9]

交叉验证 网格搜索笔记

调整超参数

https://sklearn.apachecn.org/docs/0.21.3/31.html

param_grid = [
  {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
  {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
 ]
from sklearn.metrics import fbeta_score, make_scorer
from sklearn.model_selection import GridSearchCV
from sklearn.svm import LinearSVC


ftwo_scorer = make_scorer(fbeta_score, beta=2)

grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=ftwo_scorer)
corer(fbeta_score, beta=2)

grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=ftwo_scorer)
上一篇:FCM算法与K-means 算法


下一篇:自动部署开源AI模型到生产环境:Sklearn、XGBoost、LightGBM、和PySpark