kaggle住房预测项目——第1部分

kaggle住房预测项目——第1部分

项目介绍

项目地址

项目目的

数据介绍

目标:预测每个房屋的销售价格是您的工作。对于测试集中的每个ID,您必须预测SalePrice变量的值。

评估指标

根据预测值的对数与观察到的销售价格的对数之间的均方根误差(RMSE)评估提交的内容。(记录日志意味着预测昂贵房屋和廉价房屋的错误将同等地影响结果。)

加载数据集

导入工具包,数据读取

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns


from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score

import warnings
warnings.filterwarnings('ignore')
#显示所有列
pd.set_option('display.max_columns', None)
#显示所有行
pd.set_option('display.max_rows', None)
#设置value的显示长度为100,默认为50
pd.set_option('max_colwidth',100)
data_sample_submission = pd.read_csv('./data/sample_submission.csv')
data_train = pd.read_csv('./data/train.csv')
data_test = pd.read_csv('./data/test.csv')

基本信息

data_sample_submission.head()
Id SalePrice
0 1461 169277.052498
1 1462 187758.393989
2 1463 183583.683570
3 1464 179317.477511
4 1465 150730.079977
data_sample_submission.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1459 entries, 0 to 1458
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Id         1459 non-null   int64  
 1   SalePrice  1459 non-null   float64
dtypes: float64(1), int64(1)
memory usage: 22.9 KB
data_train.head()
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
0 1 60 RL 65.0 8450 Pave NaN Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2003 2003 Gable CompShg VinylSd VinylSd BrkFace 196.0 Gd TA PConc Gd TA No GLQ 706 Unf 0 150 856 GasA Ex Y SBrkr 856 854 0 1710 1 0 2 1 3 1 Gd 8 Typ 0 NaN Attchd 2003.0 RFn 2 548 TA TA Y 0 61 0 0 0 0 NaN NaN NaN 0 2 2008 WD Normal 208500
1 2 20 RL 80.0 9600 Pave NaN Reg Lvl AllPub FR2 Gtl Veenker Feedr Norm 1Fam 1Story 6 8 1976 1976 Gable CompShg MetalSd MetalSd None 0.0 TA TA CBlock Gd TA Gd ALQ 978 Unf 0 284 1262 GasA Ex Y SBrkr 1262 0 0 1262 0 1 2 0 3 1 TA 6 Typ 1 TA Attchd 1976.0 RFn 2 460 TA TA Y 298 0 0 0 0 0 NaN NaN NaN 0 5 2007 WD Normal 181500
2 3 60 RL 68.0 11250 Pave NaN IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2001 2002 Gable CompShg VinylSd VinylSd BrkFace 162.0 Gd TA PConc Gd TA Mn GLQ 486 Unf 0 434 920 GasA Ex Y SBrkr 920 866 0 1786 1 0 2 1 3 1 Gd 6 Typ 1 TA Attchd 2001.0 RFn 2 608 TA TA Y 0 42 0 0 0 0 NaN NaN NaN 0 9 2008 WD Normal 223500
3 4 70 RL 60.0 9550 Pave NaN IR1 Lvl AllPub Corner Gtl Crawfor Norm Norm 1Fam 2Story 7 5 1915 1970 Gable CompShg Wd Sdng Wd Shng None 0.0 TA TA BrkTil TA Gd No ALQ 216 Unf 0 540 756 GasA Gd Y SBrkr 961 756 0 1717 1 0 1 0 3 1 Gd 7 Typ 1 Gd Detchd 1998.0 Unf 3 642 TA TA Y 0 35 272 0 0 0 NaN NaN NaN 0 2 2006 WD Abnorml 140000
4 5 60 RL 84.0 14260 Pave NaN IR1 Lvl AllPub FR2 Gtl NoRidge Norm Norm 1Fam 2Story 8 5 2000 2000 Gable CompShg VinylSd VinylSd BrkFace 350.0 Gd TA PConc Gd TA Av GLQ 655 Unf 0 490 1145 GasA Ex Y SBrkr 1145 1053 0 2198 1 0 2 1 4 1 Gd 9 Typ 1 TA Attchd 2000.0 RFn 3 836 TA TA Y 192 84 0 0 0 0 NaN NaN NaN 0 12 2008 WD Normal 250000
data_train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1460 non-null   int64  
 1   MSSubClass     1460 non-null   int64  
 2   MSZoning       1460 non-null   object 
 3   LotFrontage    1201 non-null   float64
 4   LotArea        1460 non-null   int64  
 5   Street         1460 non-null   object 
 6   Alley          91 non-null     object 
 7   LotShape       1460 non-null   object 
 8   LandContour    1460 non-null   object 
 9   Utilities      1460 non-null   object 
 10  LotConfig      1460 non-null   object 
 11  LandSlope      1460 non-null   object 
 12  Neighborhood   1460 non-null   object 
 13  Condition1     1460 non-null   object 
 14  Condition2     1460 non-null   object 
 15  BldgType       1460 non-null   object 
 16  HouseStyle     1460 non-null   object 
 17  OverallQual    1460 non-null   int64  
 18  OverallCond    1460 non-null   int64  
 19  YearBuilt      1460 non-null   int64  
 20  YearRemodAdd   1460 non-null   int64  
 21  RoofStyle      1460 non-null   object 
 22  RoofMatl       1460 non-null   object 
 23  Exterior1st    1460 non-null   object 
 24  Exterior2nd    1460 non-null   object 
 25  MasVnrType     1452 non-null   object 
 26  MasVnrArea     1452 non-null   float64
 27  ExterQual      1460 non-null   object 
 28  ExterCond      1460 non-null   object 
 29  Foundation     1460 non-null   object 
 30  BsmtQual       1423 non-null   object 
 31  BsmtCond       1423 non-null   object 
 32  BsmtExposure   1422 non-null   object 
 33  BsmtFinType1   1423 non-null   object 
 34  BsmtFinSF1     1460 non-null   int64  
 35  BsmtFinType2   1422 non-null   object 
 36  BsmtFinSF2     1460 non-null   int64  
 37  BsmtUnfSF      1460 non-null   int64  
 38  TotalBsmtSF    1460 non-null   int64  
 39  Heating        1460 non-null   object 
 40  HeatingQC      1460 non-null   object 
 41  CentralAir     1460 non-null   object 
 42  Electrical     1459 non-null   object 
 43  1stFlrSF       1460 non-null   int64  
 44  2ndFlrSF       1460 non-null   int64  
 45  LowQualFinSF   1460 non-null   int64  
 46  GrLivArea      1460 non-null   int64  
 47  BsmtFullBath   1460 non-null   int64  
 48  BsmtHalfBath   1460 non-null   int64  
 49  FullBath       1460 non-null   int64  
 50  HalfBath       1460 non-null   int64  
 51  BedroomAbvGr   1460 non-null   int64  
 52  KitchenAbvGr   1460 non-null   int64  
 53  KitchenQual    1460 non-null   object 
 54  TotRmsAbvGrd   1460 non-null   int64  
 55  Functional     1460 non-null   object 
 56  Fireplaces     1460 non-null   int64  
 57  FireplaceQu    770 non-null    object 
 58  GarageType     1379 non-null   object 
 59  GarageYrBlt    1379 non-null   float64
 60  GarageFinish   1379 non-null   object 
 61  GarageCars     1460 non-null   int64  
 62  GarageArea     1460 non-null   int64  
 63  GarageQual     1379 non-null   object 
 64  GarageCond     1379 non-null   object 
 65  PavedDrive     1460 non-null   object 
 66  WoodDeckSF     1460 non-null   int64  
 67  OpenPorchSF    1460 non-null   int64  
 68  EnclosedPorch  1460 non-null   int64  
 69  3SsnPorch      1460 non-null   int64  
 70  ScreenPorch    1460 non-null   int64  
 71  PoolArea       1460 non-null   int64  
 72  PoolQC         7 non-null      object 
 73  Fence          281 non-null    object 
 74  MiscFeature    54 non-null     object 
 75  MiscVal        1460 non-null   int64  
 76  MoSold         1460 non-null   int64  
 77  YrSold         1460 non-null   int64  
 78  SaleType       1460 non-null   object 
 79  SaleCondition  1460 non-null   object 
 80  SalePrice      1460 non-null   int64  
dtypes: float64(3), int64(35), object(43)
memory usage: 924.0+ KB
data_test.head()
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition
0 1461 20 RH 80.0 11622 Pave NaN Reg Lvl AllPub Inside Gtl NAmes Feedr Norm 1Fam 1Story 5 6 1961 1961 Gable CompShg VinylSd VinylSd None 0.0 TA TA CBlock TA TA No Rec 468.0 LwQ 144.0 270.0 882.0 GasA TA Y SBrkr 896 0 0 896 0.0 0.0 1 0 2 1 TA 5 Typ 0 NaN Attchd 1961.0 Unf 1.0 730.0 TA TA Y 140 0 0 0 120 0 NaN MnPrv NaN 0 6 2010 WD Normal
1 1462 20 RL 81.0 14267 Pave NaN IR1 Lvl AllPub Corner Gtl NAmes Norm Norm 1Fam 1Story 6 6 1958 1958 Hip CompShg Wd Sdng Wd Sdng BrkFace 108.0 TA TA CBlock TA TA No ALQ 923.0 Unf 0.0 406.0 1329.0 GasA TA Y SBrkr 1329 0 0 1329 0.0 0.0 1 1 3 1 Gd 6 Typ 0 NaN Attchd 1958.0 Unf 1.0 312.0 TA TA Y 393 36 0 0 0 0 NaN NaN Gar2 12500 6 2010 WD Normal
2 1463 60 RL 74.0 13830 Pave NaN IR1 Lvl AllPub Inside Gtl Gilbert Norm Norm 1Fam 2Story 5 5 1997 1998 Gable CompShg VinylSd VinylSd None 0.0 TA TA PConc Gd TA No GLQ 791.0 Unf 0.0 137.0 928.0 GasA Gd Y SBrkr 928 701 0 1629 0.0 0.0 2 1 3 1 TA 6 Typ 1 TA Attchd 1997.0 Fin 2.0 482.0 TA TA Y 212 34 0 0 0 0 NaN MnPrv NaN 0 3 2010 WD Normal
3 1464 60 RL 78.0 9978 Pave NaN IR1 Lvl AllPub Inside Gtl Gilbert Norm Norm 1Fam 2Story 6 6 1998 1998 Gable CompShg VinylSd VinylSd BrkFace 20.0 TA TA PConc TA TA No GLQ 602.0 Unf 0.0 324.0 926.0 GasA Ex Y SBrkr 926 678 0 1604 0.0 0.0 2 1 3 1 Gd 7 Typ 1 Gd Attchd 1998.0 Fin 2.0 470.0 TA TA Y 360 36 0 0 0 0 NaN NaN NaN 0 6 2010 WD Normal
4 1465 120 RL 43.0 5005 Pave NaN IR1 HLS AllPub Inside Gtl StoneBr Norm Norm TwnhsE 1Story 8 5 1992 1992 Gable CompShg HdBoard HdBoard None 0.0 Gd TA PConc Gd TA No ALQ 263.0 Unf 0.0 1017.0 1280.0 GasA Ex Y SBrkr 1280 0 0 1280 0.0 0.0 2 0 2 1 Gd 5 Typ 0 NaN Attchd 1992.0 RFn 2.0 506.0 TA TA Y 0 82 0 0 144 0 NaN NaN NaN 0 1 2010 WD Normal
data_test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1459 entries, 0 to 1458
Data columns (total 80 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1459 non-null   int64  
 1   MSSubClass     1459 non-null   int64  
 2   MSZoning       1455 non-null   object 
 3   LotFrontage    1232 non-null   float64
 4   LotArea        1459 non-null   int64  
 5   Street         1459 non-null   object 
 6   Alley          107 non-null    object 
 7   LotShape       1459 non-null   object 
 8   LandContour    1459 non-null   object 
 9   Utilities      1457 non-null   object 
 10  LotConfig      1459 non-null   object 
 11  LandSlope      1459 non-null   object 
 12  Neighborhood   1459 non-null   object 
 13  Condition1     1459 non-null   object 
 14  Condition2     1459 non-null   object 
 15  BldgType       1459 non-null   object 
 16  HouseStyle     1459 non-null   object 
 17  OverallQual    1459 non-null   int64  
 18  OverallCond    1459 non-null   int64  
 19  YearBuilt      1459 non-null   int64  
 20  YearRemodAdd   1459 non-null   int64  
 21  RoofStyle      1459 non-null   object 
 22  RoofMatl       1459 non-null   object 
 23  Exterior1st    1458 non-null   object 
 24  Exterior2nd    1458 non-null   object 
 25  MasVnrType     1443 non-null   object 
 26  MasVnrArea     1444 non-null   float64
 27  ExterQual      1459 non-null   object 
 28  ExterCond      1459 non-null   object 
 29  Foundation     1459 non-null   object 
 30  BsmtQual       1415 non-null   object 
 31  BsmtCond       1414 non-null   object 
 32  BsmtExposure   1415 non-null   object 
 33  BsmtFinType1   1417 non-null   object 
 34  BsmtFinSF1     1458 non-null   float64
 35  BsmtFinType2   1417 non-null   object 
 36  BsmtFinSF2     1458 non-null   float64
 37  BsmtUnfSF      1458 non-null   float64
 38  TotalBsmtSF    1458 non-null   float64
 39  Heating        1459 non-null   object 
 40  HeatingQC      1459 non-null   object 
 41  CentralAir     1459 non-null   object 
 42  Electrical     1459 non-null   object 
 43  1stFlrSF       1459 non-null   int64  
 44  2ndFlrSF       1459 non-null   int64  
 45  LowQualFinSF   1459 non-null   int64  
 46  GrLivArea      1459 non-null   int64  
 47  BsmtFullBath   1457 non-null   float64
 48  BsmtHalfBath   1457 non-null   float64
 49  FullBath       1459 non-null   int64  
 50  HalfBath       1459 non-null   int64  
 51  BedroomAbvGr   1459 non-null   int64  
 52  KitchenAbvGr   1459 non-null   int64  
 53  KitchenQual    1458 non-null   object 
 54  TotRmsAbvGrd   1459 non-null   int64  
 55  Functional     1457 non-null   object 
 56  Fireplaces     1459 non-null   int64  
 57  FireplaceQu    729 non-null    object 
 58  GarageType     1383 non-null   object 
 59  GarageYrBlt    1381 non-null   float64
 60  GarageFinish   1381 non-null   object 
 61  GarageCars     1458 non-null   float64
 62  GarageArea     1458 non-null   float64
 63  GarageQual     1381 non-null   object 
 64  GarageCond     1381 non-null   object 
 65  PavedDrive     1459 non-null   object 
 66  WoodDeckSF     1459 non-null   int64  
 67  OpenPorchSF    1459 non-null   int64  
 68  EnclosedPorch  1459 non-null   int64  
 69  3SsnPorch      1459 non-null   int64  
 70  ScreenPorch    1459 non-null   int64  
 71  PoolArea       1459 non-null   int64  
 72  PoolQC         3 non-null      object 
 73  Fence          290 non-null    object 
 74  MiscFeature    51 non-null     object 
 75  MiscVal        1459 non-null   int64  
 76  MoSold         1459 non-null   int64  
 77  YrSold         1459 non-null   int64  
 78  SaleType       1458 non-null   object 
 79  SaleCondition  1459 non-null   object 
dtypes: float64(11), int64(26), object(43)
memory usage: 912.0+ KB
data_train.describe()
Id MSSubClass LotFrontage LotArea OverallQual OverallCond YearBuilt YearRemodAdd MasVnrArea BsmtFinSF1 BsmtFinSF2 BsmtUnfSF TotalBsmtSF 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr TotRmsAbvGrd Fireplaces GarageYrBlt GarageCars GarageArea WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea MiscVal MoSold YrSold SalePrice
count 1460.000000 1460.000000 1201.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1452.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1379.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000
mean 730.500000 56.897260 70.049958 10516.828082 6.099315 5.575342 1971.267808 1984.865753 103.685262 443.639726 46.549315 567.240411 1057.429452 1162.626712 346.992466 5.844521 1515.463699 0.425342 0.057534 1.565068 0.382877 2.866438 1.046575 6.517808 0.613014 1978.506164 1.767123 472.980137 94.244521 46.660274 21.954110 3.409589 15.060959 2.758904 43.489041 6.321918 2007.815753 180921.195890
std 421.610009 42.300571 24.284752 9981.264932 1.382997 1.112799 30.202904 20.645407 181.066207 456.098091 161.319273 441.866955 438.705324 386.587738 436.528436 48.623081 525.480383 0.518911 0.238753 0.550916 0.502885 0.815778 0.220338 1.625393 0.644666 24.689725 0.747315 213.804841 125.338794 66.256028 61.119149 29.317331 55.757415 40.177307 496.123024 2.703626 1.328095 79442.502883
min 1.000000 20.000000 21.000000 1300.000000 1.000000 1.000000 1872.000000 1950.000000 0.000000 0.000000 0.000000 0.000000 0.000000 334.000000 0.000000 0.000000 334.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.000000 0.000000 1900.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 2006.000000 34900.000000
25% 365.750000 20.000000 59.000000 7553.500000 5.000000 5.000000 1954.000000 1967.000000 0.000000 0.000000 0.000000 223.000000 795.750000 882.000000 0.000000 0.000000 1129.500000 0.000000 0.000000 1.000000 0.000000 2.000000 1.000000 5.000000 0.000000 1961.000000 1.000000 334.500000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 5.000000 2007.000000 129975.000000
50% 730.500000 50.000000 69.000000 9478.500000 6.000000 5.000000 1973.000000 1994.000000 0.000000 383.500000 0.000000 477.500000 991.500000 1087.000000 0.000000 0.000000 1464.000000 0.000000 0.000000 2.000000 0.000000 3.000000 1.000000 6.000000 1.000000 1980.000000 2.000000 480.000000 0.000000 25.000000 0.000000 0.000000 0.000000 0.000000 0.000000 6.000000 2008.000000 163000.000000
75% 1095.250000 70.000000 80.000000 11601.500000 7.000000 6.000000 2000.000000 2004.000000 166.000000 712.250000 0.000000 808.000000 1298.250000 1391.250000 728.000000 0.000000 1776.750000 1.000000 0.000000 2.000000 1.000000 3.000000 1.000000 7.000000 1.000000 2002.000000 2.000000 576.000000 168.000000 68.000000 0.000000 0.000000 0.000000 0.000000 0.000000 8.000000 2009.000000 214000.000000
max 1460.000000 190.000000 313.000000 215245.000000 10.000000 9.000000 2010.000000 2010.000000 1600.000000 5644.000000 1474.000000 2336.000000 6110.000000 4692.000000 2065.000000 572.000000 5642.000000 3.000000 2.000000 3.000000 2.000000 8.000000 3.000000 14.000000 3.000000 2010.000000 4.000000 1418.000000 857.000000 547.000000 552.000000 508.000000 480.000000 738.000000 15500.000000 12.000000 2010.000000 755000.000000
data_train.head()
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
0 1 60 RL 65.0 8450 Pave NaN Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2003 2003 Gable CompShg VinylSd VinylSd BrkFace 196.0 Gd TA PConc Gd TA No GLQ 706 Unf 0 150 856 GasA Ex Y SBrkr 856 854 0 1710 1 0 2 1 3 1 Gd 8 Typ 0 NaN Attchd 2003.0 RFn 2 548 TA TA Y 0 61 0 0 0 0 NaN NaN NaN 0 2 2008 WD Normal 208500
1 2 20 RL 80.0 9600 Pave NaN Reg Lvl AllPub FR2 Gtl Veenker Feedr Norm 1Fam 1Story 6 8 1976 1976 Gable CompShg MetalSd MetalSd None 0.0 TA TA CBlock Gd TA Gd ALQ 978 Unf 0 284 1262 GasA Ex Y SBrkr 1262 0 0 1262 0 1 2 0 3 1 TA 6 Typ 1 TA Attchd 1976.0 RFn 2 460 TA TA Y 298 0 0 0 0 0 NaN NaN NaN 0 5 2007 WD Normal 181500
2 3 60 RL 68.0 11250 Pave NaN IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2001 2002 Gable CompShg VinylSd VinylSd BrkFace 162.0 Gd TA PConc Gd TA Mn GLQ 486 Unf 0 434 920 GasA Ex Y SBrkr 920 866 0 1786 1 0 2 1 3 1 Gd 6 Typ 1 TA Attchd 2001.0 RFn 2 608 TA TA Y 0 42 0 0 0 0 NaN NaN NaN 0 9 2008 WD Normal 223500
3 4 70 RL 60.0 9550 Pave NaN IR1 Lvl AllPub Corner Gtl Crawfor Norm Norm 1Fam 2Story 7 5 1915 1970 Gable CompShg Wd Sdng Wd Shng None 0.0 TA TA BrkTil TA Gd No ALQ 216 Unf 0 540 756 GasA Gd Y SBrkr 961 756 0 1717 1 0 1 0 3 1 Gd 7 Typ 1 Gd Detchd 1998.0 Unf 3 642 TA TA Y 0 35 272 0 0 0 NaN NaN NaN 0 2 2006 WD Abnorml 140000
4 5 60 RL 84.0 14260 Pave NaN IR1 Lvl AllPub FR2 Gtl NoRidge Norm Norm 1Fam 2Story 8 5 2000 2000 Gable CompShg VinylSd VinylSd BrkFace 350.0 Gd TA PConc Gd TA Av GLQ 655 Unf 0 490 1145 GasA Ex Y SBrkr 1145 1053 0 2198 1 0 2 1 4 1 Gd 9 Typ 1 TA Attchd 2000.0 RFn 3 836 TA TA Y 192 84 0 0 0 0 NaN NaN NaN 0 12 2008 WD Normal 250000
data_train.shape
(1460, 81)
data_test.shape
(1459, 80)

探索性数据分析(EDA)

数据缺失情况

# 缺失情况函数
def missing_data(data):
    total = data.isnull().sum().sort_values(ascending = False)
    percent = (data.isnull().sum()/data.isnull().count()*100).sort_values(ascending = False)
    return pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
missing_data(data_train)
Total Percent
PoolQC 1453 99.520548
MiscFeature 1406 96.301370
Alley 1369 93.767123
Fence 1179 80.753425
FireplaceQu 690 47.260274
LotFrontage 259 17.739726
GarageCond 81 5.547945
GarageType 81 5.547945
GarageYrBlt 81 5.547945
GarageFinish 81 5.547945
GarageQual 81 5.547945
BsmtExposure 38 2.602740
BsmtFinType2 38 2.602740
BsmtFinType1 37 2.534247
BsmtCond 37 2.534247
BsmtQual 37 2.534247
MasVnrArea 8 0.547945
MasVnrType 8 0.547945
Electrical 1 0.068493
Utilities 0 0.000000
YearRemodAdd 0 0.000000
MSSubClass 0 0.000000
Foundation 0 0.000000
ExterCond 0 0.000000
ExterQual 0 0.000000
Exterior2nd 0 0.000000
Exterior1st 0 0.000000
RoofMatl 0 0.000000
RoofStyle 0 0.000000
YearBuilt 0 0.000000
LotConfig 0 0.000000
OverallCond 0 0.000000
OverallQual 0 0.000000
HouseStyle 0 0.000000
BldgType 0 0.000000
Condition2 0 0.000000
BsmtFinSF1 0 0.000000
MSZoning 0 0.000000
LotArea 0 0.000000
Street 0 0.000000
Condition1 0 0.000000
Neighborhood 0 0.000000
LotShape 0 0.000000
LandContour 0 0.000000
LandSlope 0 0.000000
SalePrice 0 0.000000
HeatingQC 0 0.000000
BsmtFinSF2 0 0.000000
EnclosedPorch 0 0.000000
Fireplaces 0 0.000000
GarageCars 0 0.000000
GarageArea 0 0.000000
PavedDrive 0 0.000000
WoodDeckSF 0 0.000000
OpenPorchSF 0 0.000000
3SsnPorch 0 0.000000
BsmtUnfSF 0 0.000000
ScreenPorch 0 0.000000
PoolArea 0 0.000000
MiscVal 0 0.000000
MoSold 0 0.000000
YrSold 0 0.000000
SaleType 0 0.000000
Functional 0 0.000000
TotRmsAbvGrd 0 0.000000
KitchenQual 0 0.000000
KitchenAbvGr 0 0.000000
BedroomAbvGr 0 0.000000
HalfBath 0 0.000000
FullBath 0 0.000000
BsmtHalfBath 0 0.000000
BsmtFullBath 0 0.000000
GrLivArea 0 0.000000
LowQualFinSF 0 0.000000
2ndFlrSF 0 0.000000
1stFlrSF 0 0.000000
CentralAir 0 0.000000
SaleCondition 0 0.000000
Heating 0 0.000000
TotalBsmtSF 0 0.000000
Id 0 0.000000

探索特征

# 离散数据
def lisan_plot(column, data):
    fig = plt.figure(figsize=(10,4))
    plt.subplot2grid((1,2),(0,0))
    sns.barplot(x=data[column].value_counts().index, y=data[column].value_counts().values)
    plt.title(column)
    plt.ylabel('数量')
    
    plt.subplot2grid((1,2),(0,1))
    sns.boxplot(x=column, y='SalePrice', data=data)
    
# 连续数据
def lianxu_plot(column, data):
    fig = plt.figure(figsize=(10,4))
    plt.subplot2grid((1,2),(0,0))
    sns.distplot(data[column].dropna())
    plt.xlabel(column)
    plt.ylabel('数量')
    
    plt.subplot2grid((1,2),(0,1))
    sns.scatterplot(data[column].dropna(), data['SalePrice'])
    plt.show()
1.MSSubClass:

Identifies the type of dwelling involved in the sale.标识出售中涉及的住宅类型。

column = 'MSSubClass'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
15
20     536
60     299
50     144
120     87
30      69
160     63
70      60
80      58
90      52
190     30
85      20
75      16
45      12
180     10
40       4
Name: MSSubClass, dtype: int64

kaggle住房预测项目——第1部分

2.MSZoning:

Identifies the general zoning classification of the sale.确定销售的一般分区分类。

column = 'MSZoning'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
RL         1151
RM          218
FV           65
RH           16
C (all)      10
Name: MSZoning, dtype: int64

kaggle住房预测项目——第1部分

3.LotFrontage:

Linear feet of street connected to property临街:与物业相连的线性的几英尺的街道

column = 'LotFrontage'
print(len(data_train[column].unique()))
print('最大值和最小值:',data_train[column].max(), data_train[column].min())
print(data_train[column].unique())
lianxu_plot(column, data_train)
111
最大值和最小值: 313.0 21.0
[ 65.  80.  68.  60.  84.  85.  75.  nan  51.  50.  70.  91.  72.  66.
 101.  57.  44. 110.  98.  47. 108. 112.  74. 115.  61.  48.  33.  52.
 100.  24.  89.  63.  76.  81.  95.  69.  21.  32.  78. 121. 122.  40.
 105.  73.  77.  64.  94.  34.  90.  55.  88.  82.  71. 120. 107.  92.
 134.  62.  86. 141.  97.  54.  41.  79. 174.  99.  67.  83.  43. 103.
  93.  30. 129. 140.  35.  37. 118.  87. 116. 150. 111.  49.  96.  59.
  36.  56. 102.  58.  38. 109. 130.  53. 137.  45. 106. 104.  42.  39.
 144. 114. 128. 149. 313. 168. 182. 138. 160. 152. 124. 153.  46.]

kaggle住房预测项目——第1部分
可能是异常值:data[data[‘LotFrontage’] > 300]

4.LotArea:

Lot size in square feet地块面积(平方英尺)

column = 'LotArea'
print(len(data_train[column].unique()))
print('最大值和最小值:',data_train[column].max(), data_train[column].min())
print(data_train[column].unique())
lianxu_plot(column, data_train)
1073
最大值和最小值: 215245 1300
[ 8450  9600 11250 ... 17217 13175  9717]

kaggle住房预测项目——第1部分
可能是异常值:data[data[‘LotArea’] > 100000]

5.Street:

Type of road access to property街道:进入物业的道路类型

column = 'Street'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
2
Pave    1454
Grvl       6
Name: Street, dtype: int64

kaggle住房预测项目——第1部分

6.Alley:

Type of alley access to property小巷:通向财产的小巷的类型

column = 'Alley'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
3
Grvl    50
Pave    41
Name: Alley, dtype: int64

kaggle住房预测项目——第1部分

7.LotShape:

General shape of property一般形状

   Reg	Regular	常规的
   IR1	Slightly irregular 轻微的不规则
   IR2	Moderately Irregular 适度的不规则
   IR3	Irregular不规则
column = 'LotShape'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
Reg    925
IR1    484
IR2     41
IR3     10
Name: LotShape, dtype: int64

kaggle住房预测项目——第1部分

8.LandContour:

Flatness of the property 平坦程度

   Lvl	Near Flat/Level	*/水平
   Bnk	Banked - Quick and significant rise from street grade to building 有坡面的-快速而显著地从街道等级上升到建筑等级
   HLS	Hillside - Significant slope from side to side山坡-显著的从一边到另一边的斜坡
   Low	Depression洼地;凹地	
column = 'LandContour'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
Lvl    1311
Bnk      63
HLS      50
Low      36
Name: LandContour, dtype: int64

kaggle住房预测项目——第1部分

9.Utilities:

Type of utilities available可用的公共设备类型

   AllPub	All public Utilities (E,G,W,& S)	 所有公用事业(如,G,W, S)
   NoSewr	Electricity, Gas, and Water (Septic Tank) 电、气、水(化粪池)
   NoSeWa	Electricity and Gas Only只提供电力及煤气
   ELO	Electricity only	
column = 'Utilities'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
2
AllPub    1459
NoSeWa       1
Name: Utilities, dtype: int64

kaggle住房预测项目——第1部分

10.LotConfig:

Lot configuration 批量配置

   Inside 	Inside lot里面
   Corner	Corner lot角落
   CulDSac	Cul-de-sac死胡同
   FR2	Frontage on 2 sides of property房屋两面的正面
   FR3	Frontage on 3 sides of property三面房屋的正面
column = 'LotConfig'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
Inside     1052
Corner      263
CulDSac      94
FR2          47
FR3           4
Name: LotConfig, dtype: int64

kaggle住房预测项目——第1部分

11.LandSlope:

Slope of property斜坡

   Gtl	Gentle slope缓坡
   Mod	Moderate Slope	温和的斜坡
   Sev	Severe Slope严重的斜坡
column = 'LandSlope'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
3
Gtl    1382
Mod      65
Sev      13
Name: LandSlope, dtype: int64

kaggle住房预测项目——第1部分

12.Neighborhood:

Physical locations within Ames city limits邻居:在艾姆斯城市范围内的物理位置

   Blmngtn	Bloomington Heights
   Blueste	Bluestem
   BrDale	Briardale
   BrkSide	*side
   ClearCr	Clear Creek
   CollgCr	College Creek
   Crawfor	Crawford
   Edwards	Edwards
   Gilbert	Gilbert
   IDOTRR	Iowa DOT and Rail Road
   MeadowV	Meadow Village
   Mitchel	Mitchell
   Names	North Ames
   NoRidge	Northridge
   NPkVill	Northpark Villa
   NridgHt	Northridge Heights
   NWAmes	Northwest Ames
   OldTown	Old Town
   SWISU	South & West of Iowa State University
   Sawyer	Sawyer
   SawyerW	Sawyer West
   Somerst	Somerset
   StoneBr	Stone *
   Timber	Timberland
   Veenker	Veenker
column = 'Neighborhood'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
25
NAmes      225
CollgCr    150
OldTown    113
Edwards    100
Somerst     86
Gilbert     79
NridgHt     77
Sawyer      74
NWAmes      73
SawyerW     59
BrkSide     58
Crawfor     51
Mitchel     49
NoRidge     41
Timber      38
IDOTRR      37
ClearCr     28
StoneBr     25
SWISU       25
Blmngtn     17
MeadowV     17
BrDale      16
Veenker     11
NPkVill      9
Blueste      2
Name: Neighborhood, dtype: int64

kaggle住房预测项目——第1部分

13.Condition1:

Proximity to various conditions接近各种条件

   Artery	Adjacent to arterial street毗邻主干道
   Feedr	Adjacent to feeder street毗邻支线街	
   Norm	Normal	
   RRNn	Within 200' of North-South Railroad距离南北铁路200英尺以内
   RRAn	Adjacent to North-South Railroad紧邻南北铁路
   PosN	Near positive off-site feature--park, greenbelt, etc.近正场外特征——公园、绿地等。
   PosA	Adjacent to postive off-site feature与非现场特征相邻
   RRNe	Within 200' of East-West Railroad距离东西铁路200英尺的地方
   RRAe	Adjacent to East-West Railroad毗邻东西铁路
column = 'Condition1'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
9
Norm      1260
Feedr       81
Artery      48
RRAn        26
PosN        19
RRAe        11
PosA         8
RRNn         5
RRNe         2
Name: Condition1, dtype: int64

kaggle住房预测项目——第1部分

14.Condition2:

Proximity to various conditions (if more than one is present)接近各种条件(如果存在多于一个)

   Artery	Adjacent to arterial street
   Feedr 	Adjacent to feeder street	
   Norm	 Normal	
   RRNn	 Within 200' of North-South Railroad
   RRAn	 Adjacent to North-South Railroad
   PosN	 Near positive off-site feature--park, greenbelt, etc.
   PosA	 Adjacent to postive off-site feature
   RRNe	Within 200' of East-West Railroad
   RRAe	Adjacent to East-West Railroad
column = 'Condition2'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
8
Norm      1445
Feedr        6
Artery       2
PosN         2
RRNn         2
PosA         1
RRAn         1
RRAe         1
Name: Condition2, dtype: int64

kaggle住房预测项目——第1部分

15.BldgType:

Type of dwelling住宅类型

   1Fam	Single-family Detached	独栋独立式
   2FmCon	Two-family Conversion; originally built as one-family dwelling两家合住的转换;最初是作为一户住宅建造的
   Duplx	Duplex双工
   TwnhsE	Townhouse End Unit联排别墅结束单元
   TwnhsI	Townhouse Inside Unit联排别墅内部单位
column = 'BldgType'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
1Fam      1220
TwnhsE     114
Duplex      52
Twnhs       43
2fmCon      31
Name: BldgType, dtype: int64

kaggle住房预测项目——第1部分

16.HouseStyle:

Style of dwelling住宅风格

   1Story	One story
   1.5Fin	One and one-half story: 2nd level finished
   1.5Unf	One and one-half story: 2nd level unfinished
   2Story	Two story
   2.5Fin	Two and one-half story: 2nd level finished
   2.5Unf	Two and one-half story: 2nd level unfinished
   SFoyer	Split Foyer
   SLvl	Split Level
column = 'HouseStyle'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
8
1Story    726
2Story    445
1.5Fin    154
SLvl       65
SFoyer     37
1.5Unf     14
2.5Unf     11
2.5Fin      8
Name: HouseStyle, dtype: int64

kaggle住房预测项目——第1部分

17.OverallQual:

Rates the overall material and finish of the house
总体质量:评估房屋的整体材料和装饰

   10	Very Excellent
   9	Excellent
   8	Very Good
   7	Good
   6	Above Average
   5	Average
   4	Below Average
   3	Fair
   2	Poor
   1	Very Poor
column = 'OverallQual'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
10
5     397
6     374
7     319
8     168
4     116
9      43
3      20
10     18
2       3
1       2
Name: OverallQual, dtype: int64

kaggle住房预测项目——第1部分

18.OverallCond:

Rates the overall condition of the house对房子的整体状况进行评估

   10	Very Excellent
   9	Excellent
   8	Very Good
   7	Good
   6	Above Average	
   5	Average
   4	Below Average	
   3	Fair
   2	Poor
   1	Very Poor
column = 'OverallCond'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
9
5    821
6    252
7    205
8     72
4     57
3     25
9     22
2      5
1      1
Name: OverallCond, dtype: int64

kaggle住房预测项目——第1部分

19.YearBuilt:

Original construction date原始施工日期

column = 'YearBuilt'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
112
2006    67
2005    64
2004    54
2007    49
2003    45
1976    33
1977    32
1920    30
1959    26
1999    25
1998    25
1958    24
1965    24
1970    24
1954    24
2000    24
2002    23
2008    23
1972    23
1968    22
1971    22
1950    20
2001    20
1957    20
1962    19
1994    19
1966    18
2009    18
1995    18
1940    18
1910    17
1960    17
1993    17
1978    16
1955    16
1925    16
1963    16
1967    16
1996    15
1941    15
1964    15
1969    14
1956    14
1961    14
1997    14
1948    14
1992    13
1990    12
1953    12
1949    12
1988    11
1973    11
1915    10
1900    10
1980    10
1974    10
1979     9
1926     9
1930     9
1936     9
1984     9
1939     8
1922     8
1975     8
1916     8
1924     7
1928     7
1918     7
1914     7
1923     7
1946     7
1935     6
1945     6
1931     6
1982     6
1921     6
1951     6
1985     5
1937     5
1947     5
1991     5
1981     5
1986     5
1952     5
1880     4
1929     4
1932     4
1938     4
1983     4
1927     3
1919     3
1934     3
1989     3
1987     3
1912     3
1885     2
1892     2
1890     2
1942     2
1908     2
1882     1
1875     1
1893     1
2010     1
1898     1
1904     1
1905     1
1906     1
1911     1
1913     1
1917     1
1872     1
Name: YearBuilt, dtype: int64

kaggle住房预测项目——第1部分

column = 'YearBuilt'
lianxu_plot(column, data_train)

kaggle住房预测项目——第1部分

20.YearRemodAdd:

Remodel date (same as construction date if no remodeling or additions)
改型日期(如无改型或加建,则与建造日期相同)

column = 'YearRemodAdd'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
61
1950    178
2006     97
2007     76
2005     73
2004     62
2000     55
2003     51
2002     48
2008     40
1996     36
1998     36
1995     31
1976     30
1999     30
1970     26
1997     25
1977     25
2009     23
1994     22
2001     21
1972     20
1965     19
1993     19
1971     18
1959     18
1968     17
1992     17
1978     16
1966     15
1958     15
1990     15
1962     14
1954     14
1969     14
1991     14
1963     13
1960     12
1967     12
1980     12
1973     11
1964     11
1989     11
1987     10
1975     10
1979     10
1956     10
1953     10
1957      9
1988      9
1955      9
1985      9
1961      8
1981      8
1974      7
1982      7
1984      7
2010      6
1983      5
1952      5
1986      5
1951      4
Name: YearRemodAdd, dtype: int64

kaggle住房预测项目——第1部分

column = 'YearRemodAdd'
lianxu_plot(column, data_train)

kaggle住房预测项目——第1部分

21.RoofStyle:

Type of roof屋顶类型

   Flat	Flat
   Gable	Gable
   Gambrel	Gabrel (Barn)
   Hip	Hip
   Mansard	Mansard
   Shed	Shed
column = 'RoofStyle'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
Gable      1141
Hip         286
Flat         13
Gambrel      11
Mansard       7
Shed          2
Name: RoofStyle, dtype: int64

kaggle住房预测项目——第1部分

22.RoofMatl:

Roof material屋顶材料

   ClyTile	Clay or Tile
   CompShg	Standard (Composite) Shingle
   Membran	Membrane
   Metal	Metal
   Roll	Roll
   Tar&Grv	Gravel & Tar
   WdShake	Wood Shakes
   WdShngl	Wood Shingles
column = 'RoofMatl'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
8
CompShg    1434
Tar&Grv      11
WdShngl       6
WdShake       5
Metal         1
Roll          1
Membran       1
ClyTile       1
Name: RoofMatl, dtype: int64

kaggle住房预测项目——第1部分

23.Exterior1st:

Exterior covering on house房屋外盖

   AsbShng	Asbestos Shingles
   AsphShn	Asphalt Shingles沥青瓦
   BrkComm	Brick Common
   BrkFace	Brick Face砖面
   CBlock	Cinder Block煤渣砖
   CemntBd	Cement Board
   HdBoard	Hard Board
   ImStucc	Imitation Stucco
   MetalSd	Metal Siding
   Other	Other
   Plywood	Plywood
   PreCast	PreCast	
   Stone	Stone
   Stucco	Stucco
   VinylSd	Vinyl Siding
   Wd Sdng	Wood Siding
   WdShing	Wood Shingles
column = 'Exterior1st'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
15
VinylSd    515
HdBoard    222
MetalSd    220
Wd Sdng    206
Plywood    108
CemntBd     61
BrkFace     50
WdShing     26
Stucco      25
AsbShng     20
BrkComm      2
Stone        2
AsphShn      1
CBlock       1
ImStucc      1
Name: Exterior1st, dtype: int64

kaggle住房预测项目——第1部分

24.Exterior2nd:

Exterior covering on house (if more than one material)
房屋外部覆盖物(如果多于一种材料)

   AsbShng	Asbestos Shingles
   AsphShn	Asphalt Shingles
   BrkComm	Brick Common
   BrkFace	Brick Face
   CBlock	Cinder Block
   CemntBd	Cement Board
   HdBoard	Hard Board
   ImStucc	Imitation Stucco
   MetalSd	Metal Siding
   Other	Other
   Plywood	Plywood
   PreCast	PreCast
   Stone	Stone
   Stucco	Stucco
   VinylSd	Vinyl Siding
   Wd Sdng	Wood Siding
   WdShing	Wood Shingles
column = 'Exterior2nd'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
16
VinylSd    504
MetalSd    214
HdBoard    207
Wd Sdng    197
Plywood    142
CmentBd     60
Wd Shng     38
Stucco      26
BrkFace     25
AsbShng     20
ImStucc     10
Brk Cmn      7
Stone        5
AsphShn      3
CBlock       1
Other        1
Name: Exterior2nd, dtype: int64

kaggle住房预测项目——第1部分

25.MasVnrType:

Masonry veneer type表层砌体类型

   BrkCmn	Brick Common
   BrkFace	Brick Face
   CBlock	Cinder Block
   None	None
   Stone	Stone
column = 'MasVnrType'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
None       864
BrkFace    445
Stone      128
BrkCmn      15
Name: MasVnrType, dtype: int64

kaggle住房预测项目——第1部分

26.MasVnrArea:

Masonry veneer area in square feet砌体贴面面积,平方英尺

column = 'MasVnrArea'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
328
最小值和最大值: 0.0 1600.0

kaggle住房预测项目——第1部分

27.ExterQual:

Evaluates the quality of the material on the exterior
评估外部材料的质量

   Ex	Excellent
   Gd	Good
   TA	Average/Typical
   Fa	Fair
   Po	Poor
column = 'ExterQual'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
TA    906
Gd    488
Ex     52
Fa     14
Name: ExterQual, dtype: int64

kaggle住房预测项目——第1部分

28.ExterCond:

Evaluates the present condition of the material on the exterior评估外部材料的现状

   Ex	Excellent
   Gd	Good
   TA	Average/Typical
   Fa	Fair
   Po	Poor
column = 'ExterCond'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
TA    1282
Gd     146
Fa      28
Ex       3
Po       1
Name: ExterCond, dtype: int64

kaggle住房预测项目——第1部分

29.Foundation:

Type of foundation基础的类型

   BrkTil	Brick & Tile砖和瓦
   CBlock	Cinder Block煤渣砖
   PConc	Poured Contrete	
   Slab	Slab
   Stone	Stone
   Wood	Wood
column = 'Foundation'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
PConc     647
CBlock    634
BrkTil    146
Slab       24
Stone       6
Wood        3
Name: Foundation, dtype: int64

kaggle住房预测项目——第1部分

30.BsmtQual:

Evaluates the height of the basement.评估地下室的高度

   Ex	Excellent (100+ inches)	
   Gd	Good (90-99 inches)
   TA	Typical (80-89 inches)
   Fa	Fair (70-79 inches)
   Po	Poor (<70 inches
   NA	No Basement
column = 'BsmtQual'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
TA    649
Gd    618
Ex    121
Fa     35
Name: BsmtQual, dtype: int64

kaggle住房预测项目——第1部分

31.BsmtCond:

Evaluates the general condition of the basement
评估地下室的总体状况

   Ex	Excellent
   Gd	Good
   TA	Typical - slight dampness allowed
   Fa	Fair - dampness or some cracking or settling
   Po	Poor - Severe cracking, settling, or wetness
   NA	No Basement
column = 'BsmtCond'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
TA    1311
Gd      65
Fa      45
Po       2
Name: BsmtCond, dtype: int64

kaggle住房预测项目——第1部分

32.BsmtExposure:

Refers to walkout or garden level walls
指罢工的或花园水平的墙

   Gd	Good Exposure
   Av	Average Exposure (split levels or foyers typically score average or above)	
   Mn	Mimimum Exposure
   No	No Exposure
   NA	No Basement
column = 'BsmtExposure'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
No    953
Av    221
Gd    134
Mn    114
Name: BsmtExposure, dtype: int64

kaggle住房预测项目——第1部分

33.BsmtFinType1:

Rating of basement finished area地下室完工面积等级

   GLQ	Good Living Quarters
   ALQ	Average Living Quarters
   BLQ	Below Average Living Quarters	
   Rec	Average Rec Room
   LwQ	Low Quality
   Unf	Unfinshed
   NA	No Basement
column = 'BsmtFinType1'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
7
Unf    430
GLQ    418
ALQ    220
BLQ    148
Rec    133
LwQ     74
Name: BsmtFinType1, dtype: int64

kaggle住房预测项目——第1部分

34.BsmtFinSF1:

Type 1 finished square feet一型成品平方英尺

column = 'BsmtFinSF1'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
637
最小值和最大值: 0 5644

kaggle住房预测项目——第1部分

异常值: >5000

35.BsmtFinType2:

Rating of basement finished area (if multiple types)
地下室完工面积等级(如多类型)

   GLQ	Good Living Quarters
   ALQ	Average Living Quarters
   BLQ	Below Average Living Quarters	
   Rec	Average Rec Room
   LwQ	Low Quality
   Unf	Unfinshed
   NA	No Basement
column = 'BsmtFinType2'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
7
Unf    1256
Rec      54
LwQ      46
BLQ      33
ALQ      19
GLQ      14
Name: BsmtFinType2, dtype: int64

kaggle住房预测项目——第1部分

36.BsmtFinSF2:

Type 2 finished square feet
2型完成平方英尺

column = 'BsmtFinSF2'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
144
最小值和最大值: 0 1474

kaggle住房预测项目——第1部分

37.BsmtUnfSF:

Unfinished square feet of basement area
未完成的地下室面积

column = 'BsmtUnfSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
780
最小值和最大值: 0 2336

kaggle住房预测项目——第1部分

38.TotalBsmtSF:

Total square feet of basement area
地下室总面积

column = 'TotalBsmtSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
721
最小值和最大值: 0 6110

kaggle住房预测项目——第1部分

异常值: >5000

39.Heating:

Type of heating加热方式

   Floor	Floor Furnace
   GasA	Gas forced warm air furnace
   GasW	Gas hot water or steam heat
   Grav	Gravity furnace	
   OthW	Hot water or steam heat other than gas
   Wall	Wall furnace
column = 'Heating'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
GasA     1428
GasW       18
Grav        7
Wall        4
OthW        2
Floor       1
Name: Heating, dtype: int64

kaggle住房预测项目——第1部分

40.HeatingQC:

Heating quality and condition加热质量和条件

   Ex	Excellent
   Gd	Good
   TA	Average/Typical
   Fa	Fair
   Po	Poor
column = 'HeatingQC'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
Ex    741
TA    428
Gd    241
Fa     49
Po      1
Name: HeatingQC, dtype: int64

kaggle住房预测项目——第1部分

41.CentralAir:

Central air conditioning*空调

   N	No
   Y	Yes
column = 'CentralAir'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
2
Y    1365
N      95
Name: CentralAir, dtype: int64

kaggle住房预测项目——第1部分

42.Electrical:

Electrical system电气系统

   SBrkr	Standard Circuit Breakers & Romex
   FuseA	Fuse Box over 60 AMP and all Romex wiring (Average)	
   FuseF	60 AMP Fuse Box and mostly Romex wiring (Fair)
   FuseP	60 AMP Fuse Box and mostly knob & tube wiring (poor)
   Mix	Mixed
column = 'Electrical'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
SBrkr    1334
FuseA      94
FuseF      27
FuseP       3
Mix         1
Name: Electrical, dtype: int64

kaggle住房预测项目——第1部分

43.1stFlrSF:

First Floor square feet一楼平方英尺

column = '1stFlrSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
753
最小值和最大值: 334 4692

kaggle住房预测项目——第1部分
可能的异常值: >4000

44.2ndFlrSF:

Second floor square feet二楼平方英尺

column = '2ndFlrSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
417
最小值和最大值: 0 2065

kaggle住房预测项目——第1部分

45.LowQualFinSF:

Low quality finished square feet (all floors)低质量完工面积(所有楼层)

column = 'LowQualFinSF'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
24
0      1434
80        3
360       2
528       1
53        1
120       1
144       1
156       1
205       1
232       1
234       1
371       1
572       1
390       1
392       1
397       1
420       1
473       1
479       1
481       1
513       1
514       1
515       1
384       1
Name: LowQualFinSF, dtype: int64

kaggle住房预测项目——第1部分

column = 'LowQualFinSF'
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
最小值和最大值: 0 572

kaggle住房预测项目——第1部分

46.GrLivArea:

Above grade (ground) living area square feet以上(地面)居住面积平方英尺

column = 'GrLivArea'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
861
最小值和最大值: 334 5642

kaggle住房预测项目——第1部分

47.BsmtFullBath:

Basement full bathrooms地下室全浴室

column = 'BsmtFullBath'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
0    856
1    588
2     15
3      1
Name: BsmtFullBath, dtype: int64

kaggle住房预测项目——第1部分

48.BsmtHalfBath:

Basement half bathrooms半地下室卫生间

column = 'BsmtHalfBath'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
3
0    1378
1      80
2       2
Name: BsmtHalfBath, dtype: int64

kaggle住房预测项目——第1部分

49.FullBath:

Full bathrooms above grade

column = 'FullBath'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
2    768
1    650
3     33
0      9
Name: FullBath, dtype: int64

kaggle住房预测项目——第1部分

50.HalfBath:

Half baths above grade

column = 'HalfBath'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
3
0    913
1    535
2     12
Name: HalfBath, dtype: int64

kaggle住房预测项目——第1部分

51.BedroomAbvGr:

Bedrooms above grade (does NOT include basement bedrooms)
楼上卧室(不包括地下室卧室)

column = 'BedroomAbvGr'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
8
3    804
2    358
4    213
1     50
5     21
6      7
0      6
8      1
Name: BedroomAbvGr, dtype: int64

kaggle住房预测项目——第1部分

52.KitchenAbvGr:

Kitchens above grade

column = 'KitchenAbvGr'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
1    1392
2      65
3       2
0       1
Name: KitchenAbvGr, dtype: int64

kaggle住房预测项目——第1部分

53.KitchenQual:

Kitchen quality

   Ex	Excellent
   Gd	Good
   TA	Typical/Average
   Fa	Fair
   Po	Poor
column = 'KitchenQual'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
TA    735
Gd    586
Ex    100
Fa     39
Name: KitchenQual, dtype: int64

kaggle住房预测项目——第1部分

54.TotRmsAbvGrd:

Total rooms above grade (does not include bathrooms)
以上楼层客房总数(不含浴室)

column = 'TotRmsAbvGrd'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
12
6     402
7     329
5     275
8     187
4      97
9      75
10     47
11     18
3      17
12     11
14      1
2       1
Name: TotRmsAbvGrd, dtype: int64

kaggle住房预测项目——第1部分

55.Functional:

Home functionality (Assume typical unless deductions are warranted)家庭功能(假设是典型的,除非有必要进行扣减)

   Typ	Typical Functionality
   Min1	Minor Deductions 1 小扣除1
   Min2	Minor Deductions 2
   Mod	Moderate Deductions温和的扣除
   Maj1	Major Deductions 1
   Maj2	Major Deductions 2
   Sev	Severely Damaged严重受损
   Sal	Salvage only
column = 'Functional'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
7
Typ     1360
Min2      34
Min1      31
Mod       15
Maj1      14
Maj2       5
Sev        1
Name: Functional, dtype: int64

kaggle住房预测项目——第1部分

56.Fireplaces:

Number of fireplaces壁炉的数目

column = 'Fireplaces'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
0    690
1    650
2    115
3      5
Name: Fireplaces, dtype: int64

kaggle住房预测项目——第1部分

57.FireplaceQu:

Fireplace quality壁炉质量

   Ex	Excellent - Exceptional Masonry Fireplace
   Gd	Good - Masonry Fireplace in main level
   TA	Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement
   Fa	Fair - Prefabricated Fireplace in basement
   Po	Poor - Ben Franklin Stove
   NA	No Fireplace
column = 'FireplaceQu'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
Gd    380
TA    313
Fa     33
Ex     24
Po     20
Name: FireplaceQu, dtype: int64

kaggle住房预测项目——第1部分

58.GarageType:

Garage location车库位置

   2Types	More than one type of garage
   Attchd	Attached to home附加到家里
   Basment	Basement Garage地下室车库
   BuiltIn	Built-In (Garage part of house - typically has room above garage)
   CarPort	Car Port
   Detchd	Detached from home
   NA	No Garage
column = 'GarageType'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
7
Attchd     870
Detchd     387
BuiltIn     88
Basment     19
CarPort      9
2Types       6
Name: GarageType, dtype: int64

kaggle住房预测项目——第1部分

59.GarageYrBlt:

Year garage was built

column = 'GarageYrBlt'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
98
2005.0    65
2006.0    59
2004.0    53
2003.0    50
2007.0    49
1977.0    35
1998.0    31
1999.0    30
2008.0    29
1976.0    29
2000.0    27
2002.0    26
1968.0    26
1950.0    24
1993.0    22
2009.0    21
1965.0    21
1966.0    21
1962.0    21
1958.0    21
2001.0    20
1996.0    20
1957.0    20
1970.0    20
1960.0    19
1997.0    19
1978.0    19
1954.0    19
1974.0    18
1994.0    18
1995.0    18
1964.0    18
1959.0    17
1963.0    16
1990.0    16
1956.0    16
1969.0    15
1979.0    15
1980.0    15
1967.0    15
1988.0    14
1973.0    14
1940.0    14
1920.0    14
1972.0    14
1961.0    13
1971.0    13
1955.0    13
1992.0    13
1953.0    12
1987.0    11
1948.0    11
1985.0    10
1981.0    10
1941.0    10
1925.0    10
1989.0    10
1975.0     9
1991.0     9
1939.0     9
1984.0     8
1949.0     8
1930.0     8
1983.0     7
1986.0     6
1951.0     6
1926.0     6
1922.0     5
1936.0     5
1916.0     5
1931.0     4
1945.0     4
1935.0     4
1928.0     4
1946.0     4
1982.0     4
1938.0     3
1921.0     3
1924.0     3
1910.0     3
1952.0     3
1932.0     3
2010.0     3
1923.0     3
1937.0     2
1934.0     2
1918.0     2
1947.0     2
1929.0     2
1914.0     2
1915.0     2
1942.0     2
1908.0     1
1927.0     1
1933.0     1
1900.0     1
1906.0     1
Name: GarageYrBlt, dtype: int64

kaggle住房预测项目——第1部分

print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
最小值和最大值: 1900.0 2010.0

kaggle住房预测项目——第1部分

60.GarageFinish:

Interior finish of the garage车库的内部装修

   Fin	Finished
   RFn	Rough Finished	
   Unf	Unfinished
   NA	No Garage
column = 'GarageFinish'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
Unf    605
RFn    422
Fin    352
Name: GarageFinish, dtype: int64

kaggle住房预测项目——第1部分

61.GarageCars:

Size of garage in car capacity车库的容量

column = 'GarageCars'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
2    824
1    369
3    181
0     81
4      5
Name: GarageCars, dtype: int64

kaggle住房预测项目——第1部分

62.GarageArea:

Size of garage in square feet车库面积(平方英尺)

column = 'GarageArea'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
441
最小值和最大值: 0 1418

kaggle住房预测项目——第1部分

63.GarageQual:

Garage quality车库质量

   Ex	Excellent
   Gd	Good
   TA	Typical/Average
   Fa	Fair
   Po	Poor
   NA	No Garage
column = 'GarageQual'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
TA    1311
Fa      48
Gd      14
Po       3
Ex       3
Name: GarageQual, dtype: int64

kaggle住房预测项目——第1部分

64.GarageCond:

Garage condition车库条件

   Ex	Excellent
   Gd	Good
   TA	Typical/Average
   Fa	Fair
   Po	Poor
   NA	No Garage
column = 'GarageCond'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
TA    1326
Fa      35
Gd       9
Po       7
Ex       2
Name: GarageCond, dtype: int64

kaggle住房预测项目——第1部分

65.PavedDrive:

Paved driveway 铺设车道

   Y	Paved 
   P	Partial Pavement
   N	Dirt/Gravel
column = 'PavedDrive'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
3
Y    1340
N      90
P      30
Name: PavedDrive, dtype: int64

kaggle住房预测项目——第1部分

66.WoodDeckSF:

Wood deck area in square feet
木甲板面积,平方英尺

column = 'WoodDeckSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
274
最小值和最大值: 0 857

kaggle住房预测项目——第1部分

67.OpenPorchSF:

Open porch area in square feet
开放式门廊面积,平方英尺

column = 'OpenPorchSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
202
最小值和最大值: 0 547

kaggle住房预测项目——第1部分

68.EnclosedPorch:

Enclosed porch area in square feet
封闭门廊面积,平方英尺

column = 'EnclosedPorch'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
120
最小值和最大值: 0 552

kaggle住房预测项目——第1部分

69.3SsnPorch:

Three season porch area in square feet
三季门廊面积,平方英尺

column = '3SsnPorch'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
20
最小值和最大值: 0 508

kaggle住房预测项目——第1部分

70.ScreenPorch:

Screen porch area in square feet
屏风门廊面积,平方英尺

column = 'ScreenPorch'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
76
最小值和最大值: 0 480

kaggle住房预测项目——第1部分

71.PoolArea:

Pool area in square feet
游泳池面积,单位为平方英尺

column = 'PoolArea'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
8
最小值和最大值: 0 738

kaggle住房预测项目——第1部分

72.PoolQC:

Pool quality池质量

   Ex	Excellent
   Gd	Good
   TA	Average/Typical
   Fa	Fair
   NA	No Pool
column = 'PoolQC'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
Gd    3
Fa    2
Ex    2
Name: PoolQC, dtype: int64

kaggle住房预测项目——第1部分

73.Fence:

Fence quality栅栏质量

   GdPrv	Good Privacy良好的隐私
   MnPrv	Minimum Privacy
   GdWo	Good Wood
   MnWw	Minimum Wood/Wire
   NA	No Fence
column = 'Fence'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
MnPrv    157
GdPrv     59
GdWo      54
MnWw      11
Name: Fence, dtype: int64

kaggle住房预测项目——第1部分

74.MiscFeature:

Miscellaneous feature not covered in other categories其他类别未包括的杂项特性

   Elev	Elevator电梯
   Gar2	2nd Garage (if not described in garage section)第二车库(如果在车库部分没有描述)
   Othr	Other
   Shed	Shed (over 100 SF)小屋(100平方英尺以上)
   TenC	Tennis Court网球场
   NA	None
column = 'MiscFeature'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
Shed    49
Othr     2
Gar2     2
TenC     1
Name: MiscFeature, dtype: int64

kaggle住房预测项目——第1部分

75.MiscVal:

$Value of miscellaneous feature $杂项功能的价值

column = 'MiscVal'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
21
0        1408
400        11
500         8
700         5
450         4
2000        4
600         4
1200        2
480         2
1150        1
800         1
15500       1
620         1
3500        1
560         1
2500        1
1300        1
1400        1
350         1
8300        1
54          1
Name: MiscVal, dtype: int64

kaggle住房预测项目——第1部分

print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
最小值和最大值: 0 15500

kaggle住房预测项目——第1部分

76.MoSold:

Month Sold (MM)售出月份(MM)

column = 'MoSold'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
12
6     253
7     234
5     204
4     141
8     122
3     106
10     89
11     79
9      63
12     59
1      58
2      52
Name: MoSold, dtype: int64

kaggle住房预测项目——第1部分

77.YrSold:

Year Sold (YYYY)售出年(年)

column = 'YrSold'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
2009    338
2007    329
2006    314
2008    304
2010    175
Name: YrSold, dtype: int64

kaggle住房预测项目——第1部分

78.SaleType:

Type of sale销售类型

   WD 	Warranty Deed - Conventional契约-契约
   CWD	Warranty Deed - Cash担保契约-现金
   VWD	Warranty Deed - VA Loan担保契约- VA贷款
   New	Home just constructed and sold房子刚建好就卖了
   COD	Court Officer Deed/Estate法院官员行为/房地产
   Con	Contract 15% Down payment regular terms合同首付款15%,定期条款
   ConLw	Contract Low Down payment and low interest低首付,低利息
   ConLI	Contract Low Interest合同低利率
   ConLD	Contract Low Down合同低
   Oth	Other
column = 'SaleType'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
9
WD       1267
New       122
COD        43
ConLD       9
ConLI       5
ConLw       5
CWD         4
Oth         3
Con         2
Name: SaleType, dtype: int64

kaggle住房预测项目——第1部分

79. SaleCondition:

Condition of sale销售条件

   Normal	Normal Sale正常的销售
   Abnorml	Abnormal Sale -  trade, foreclosure, short sale非正常销售交易,丧失抵押品赎回权,卖空
   AdjLand	Adjoining Land Purchase毗邻的土地购买
   Alloca	Allocation - two linked properties with separate deeds, typically condo with a garage unit	房产分配——两个相连的房产,有各自的契约,通常是带车库的公寓
   Family	Sale between family members家庭成员间买卖
   Partial	Home was not completed when last assessed (associated with New Homes)房屋在最后一次评估时未完成(与新房屋相关)
column = 'SaleCondition'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
Normal     1198
Partial     125
Abnorml     101
Family       20
Alloca       12
AdjLand       4
Name: SaleCondition, dtype: int64

kaggle住房预测项目——第1部分

80. SalePrice
column = 'SalePrice'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
663
最小值和最大值: 34900 755000

kaggle住房预测项目——第1部分

上一篇:vue non-props属性


下一篇:Non-static method ‘*‘ cannot be referenced from a static context (在静态上下文中不能引用非静态方法)