05 衡量回归算法的标准,MSE vs MAE
In [3]:
import numpy as np import matplotlib.pyplot as plt from sklearn import datasets import datetime;print("Run by CYJ,",datetime.datetime.now())
Run by CYJ, 2022-01-20 12:53:42.123449
波士顿房产数据
In [4]:
boston = datasets.load_boston()
In [5]:
boston.keys()
Out[5]:
dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])
In [6]:
# print(boston.DESCR)
In [7]:
boston.feature_names
Out[7]:
array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT'], dtype='<U7')
In [13]:
x = boston.data[:,5] # 只使用房间数量这个特征
In [14]:
x.shape
Out[14]:
(506,)
In [15]:
y = boston.target
In [16]:
y.shape
Out[16]:
(506,)
In [17]:
plt.scatter(x, y) plt.show()
In [18]:
np.max(y)
Out[18]:
50.0
In [19]:
x = x[y < 50.0] y = y[y < 50.0]
In [20]:
x.shape
Out[20]:
(490,)
In [21]:
y.shape
Out[21]:
(490,)
In [22]:
plt.scatter(x, y) plt.show()
使用简单线性回归法
In [23]:
from playML.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(x, y, seed=666)
In [24]:
x_train.shape
Out[24]:
(392,)
In [25]:
y_train.shape
Out[25]:
(392,)
In [26]:
x_test.shape
Out[26]:
(98,)
In [27]:
y_test.shape
Out[27]:
(98,)
In [28]:
from playML.SimpleLinearRegression import SimpleLinearRegression
In [29]:
reg = SimpleLinearRegression() reg.fit(x_train, y_train)
Out[29]:
SimpleLinearRegression()
In [30]:
reg.a_
Out[30]:
7.8608543562689555
In [31]:
reg.b_
Out[31]:
-27.459342806705543
In [32]:
plt.scatter(x_train, y_train) plt.plot(x_train, reg.predict(x_train), color='r') plt.show()
In [35]:
plt.scatter(x_train, y_train) plt.scatter(x_test, y_test, color="c") plt.plot(x_train, reg.predict(x_train), color='r') plt.show()
In [36]:
y_predict = reg.predict(x_test)
MSE
In [37]:
mse_test = np.sum((y_predict - y_test)**2) / len(y_test) mse_test
Out[37]:
24.156602134387438
RMSE
In [38]:
from math import sqrt rmse_test = sqrt(mse_test) rmse_test
Out[38]:
4.914936635846635
MAE
In [39]:
mae_test = np.sum(np.absolute(y_predict - y_test))/len(y_test) mae_test
Out[39]:
3.5430974409463873
封装我们自己的评测函数
代码参见 这里
In [40]:
from playML.metrics import mean_squared_error from playML.metrics import root_mean_squared_error from playML.metrics import mean_absolute_error
In [41]:
mean_squared_error(y_test, y_predict)
Out[41]:
24.156602134387438
In [42]:
root_mean_squared_error(y_test, y_predict)
Out[42]:
4.914936635846635
In [43]:
mean_absolute_error(y_test, y_predict)
Out[43]:
3.5430974409463873
scikit-learn中的MSE和MAE
In [44]:
from sklearn.metrics import mean_squared_error from sklearn.metrics import mean_absolute_error
In [45]:
mean_squared_error(y_test, y_predict)
Out[45]:
24.156602134387438
In [46]:
mean_absolute_error(y_test, y_predict)
Out[46]:
3.5430974409463873