这次我们会用线性回归来预测波士顿的房价
首先是导入波士顿房价的数据,这是sklearn中的datasets自带的
from sklearn import datasets boston = datasets.load_boston()
先用key方法查看数据集
print(boston.keys())
得到结果
dict_keys(['data', 'target', 'feature_names', 'DESCR'])
这里的data有13个维度,target就是我们要预测的房价,接下来再查看feature_names
print(boston['feature_names'])
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO' 'B' 'LSTAT']
其中'RM'列就是我们需要的房间数,接下为了方便处理,我们将其转为DataFrame类型,并进行数据划分得到训练集和测试集
data = pd.DataFrame(boston['data'],columns=boston['feature_names']) x = pd.DataFrame(data['RM'],columns=['RM']) y = pd.DataFrame(boston['target'],columns=['target']) x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.33, random_state=42)
接下来训练线性回归模型,并进行预测
lr = LinearRegression() lr.fit(x_train,y_train) y_pre = lr.predict(x_test)
为了评价模型的好坏,我们将从以下的均方误差(MSE),均方根误差(RMSE),平均绝对误差(MAE),R Squared
from sklearn.metrics import r2_score from sklearn.metrics import mean_squared_error from sklearn.metrics import mean_absolute_error print(r2_score(y_test,y_pre)) print(mean_absolute_error(y_test,y_pre)) print(mean_squared_error(y_test,y_pre))
结果为
0.4834590168919489 4.271512885857222 39.09105111486995
下面用python实现这四种评价指标
def MSE(y_test,y_pre): print(((y_test - y_pre)**2).sum() / len(y_pre)) def RMSE(y_test,y_pre): print((((y_test - y_pre)**2).sum() / len(y_pre))**0.5) def MAE(y_test,y_pre): y1 = np.array(y_test) y2 = np.array(y_pre) print(np.sum(np.absolute(y1 - y2))/len(y1)) def r2_score_(y_test,y_pre): print(1 - ((y_test - y_pre)**2).sum() / ((y_test - y_test.mean())**2).sum()) MSE(y_test,y_pre) MAE(y_test,y_pre) r2_score_(y_test,y_pre)
结果为
target 39.091051 dtype: float64 4.271512885857222 target 0.483459 dtype: float64