分类:K-近邻分类之鸢尾花数据集学习(包含数据预处理中的标准化)(环境:Pycharm)

标准化:用数据的特征列减去该特征列均值进行中心化,再除以标准差进行缩放

1、模型精确度的探究

from sklearn.datasets import load_iris #导入鸢尾花数据集
from sklearn.neighbors import KNeighborsClassifier #导入k-近邻分类模型
from sklearn.model_selection import train_test_split as tsplit
from sklearn.preprocessing import StandardScaler #标准化函数导入
X,y=load_iris(return_X_y=True) #导入的数据是数组形式
X_train,X_test,y_train,y_test=tsplit(X,y,test_size=0.1) #test_size是数据划分的比列,X为训练集,y为测试集,二者的比例为9:1
transfer=StandardScaler() #标准化
X_train=transfer.fit_transform(X_train)
X_test=transfer.transform(X_test)
estimator=KNeighborsClassifier() #实例化模型。n_neighbors参数默认值为5
estimator.fit(X_test,y_test) #训练模型
print(estimator.score(X_test,y_test)) #模型测试精度(介于0~1)

2、绘制鸢尾花的分类图

from sklearn.datasets import load_iris
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
plt.rcParams["font.sans-serif"]=["SimHei"] #设置中文字体
plt.rcParams["axes.unicode_minus"]=False
X,y=load_iris(return_X_y=True)
iris_d=pd.DataFrame(X,columns=['Sepal_Length','Sepal_Width','Petal_length','Petal_Width']) #将导入的array数组类型数据转化为DataFrame类型
iris_d['Species']=y #加1列类别
def plot_iris(iris, col1, col2):
sns.lmplot(x = col1, y = col2, data =iris, hue = "Species", fit_reg = False)
plt.xlabel(col1)
plt.ylabel(col2)
plt.title('鸢尾花种类分布图')
plt.show()
plot_iris(iris_d, 'Petal_Width', 'Sepal_Length')

'''3、附男生受欢迎程度
三个特征值(每月飞行公里数,吃甜点量,打游戏时间),三个目标值(1:差,2:中3:优)'''
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
import numpy as np
man = pd.read_csv("E:\dating.csv") #导入txt文本数据
man_1=man[['milage','Liters','Consumtime']]
man_2=man['target']
man.data=np.array(man_1)
man.target=np.array(man_2)
x_train, x_test, y_train, y_test = train_test_split(man.data, man.target, test_size=0.2, )

3、特征工程:标准化

transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)

4、机器学习(模型训练)

estimator = KNeighborsClassifier(n_neighbors=9)
estimator.fit(x_train, y_train)

5、模型评估

方法1:比对真实值和预测值

y_predict = estimator.predict(x_test)
print("预测结果为:\n", y_predict)
print("比对真实值和预测值:\n", y_predict == y_test)

方法2:直接计算准确率

score = estimator.score(x_test, y_test)
print("准确率为:\n", score)
y_test

参考原文链接:https://blog.csdn.net/weixin_44868393/article/details/106683294

上一篇:TypeScript真香系列-类型推论和类型兼容性


下一篇:Linux通配符和转移字符(扩展匹配文件名)、man帮助文档的使用