kaggle住房预测项目——第1部分
项目介绍
项目目的
数据介绍
目标:预测每个房屋的销售价格是您的工作。对于测试集中的每个ID,您必须预测SalePrice变量的值。
评估指标
根据预测值的对数与观察到的销售价格的对数之间的均方根误差(RMSE)评估提交的内容。(记录日志意味着预测昂贵房屋和廉价房屋的错误将同等地影响结果。)
加载数据集
导入工具包,数据读取
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score
import warnings
warnings.filterwarnings('ignore')
#显示所有列
pd.set_option('display.max_columns', None)
#显示所有行
pd.set_option('display.max_rows', None)
#设置value的显示长度为100,默认为50
pd.set_option('max_colwidth',100)
data_sample_submission = pd.read_csv('./data/sample_submission.csv')
data_train = pd.read_csv('./data/train.csv')
data_test = pd.read_csv('./data/test.csv')
基本信息
data_sample_submission.head()
Id | SalePrice | |
---|---|---|
0 | 1461 | 169277.052498 |
1 | 1462 | 187758.393989 |
2 | 1463 | 183583.683570 |
3 | 1464 | 179317.477511 |
4 | 1465 | 150730.079977 |
data_sample_submission.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1459 entries, 0 to 1458
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 1459 non-null int64
1 SalePrice 1459 non-null float64
dtypes: float64(1), int64(1)
memory usage: 22.9 KB
data_train.head()
Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 60 | RL | 65.0 | 8450 | Pave | NaN | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2003 | 2003 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 196.0 | Gd | TA | PConc | Gd | TA | No | GLQ | 706 | Unf | 0 | 150 | 856 | GasA | Ex | Y | SBrkr | 856 | 854 | 0 | 1710 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 8 | Typ | 0 | NaN | Attchd | 2003.0 | RFn | 2 | 548 | TA | TA | Y | 0 | 61 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 2 | 2008 | WD | Normal | 208500 |
1 | 2 | 20 | RL | 80.0 | 9600 | Pave | NaN | Reg | Lvl | AllPub | FR2 | Gtl | Veenker | Feedr | Norm | 1Fam | 1Story | 6 | 8 | 1976 | 1976 | Gable | CompShg | MetalSd | MetalSd | None | 0.0 | TA | TA | CBlock | Gd | TA | Gd | ALQ | 978 | Unf | 0 | 284 | 1262 | GasA | Ex | Y | SBrkr | 1262 | 0 | 0 | 1262 | 0 | 1 | 2 | 0 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1976.0 | RFn | 2 | 460 | TA | TA | Y | 298 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 5 | 2007 | WD | Normal | 181500 |
2 | 3 | 60 | RL | 68.0 | 11250 | Pave | NaN | IR1 | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2001 | 2002 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 162.0 | Gd | TA | PConc | Gd | TA | Mn | GLQ | 486 | Unf | 0 | 434 | 920 | GasA | Ex | Y | SBrkr | 920 | 866 | 0 | 1786 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 6 | Typ | 1 | TA | Attchd | 2001.0 | RFn | 2 | 608 | TA | TA | Y | 0 | 42 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 9 | 2008 | WD | Normal | 223500 |
3 | 4 | 70 | RL | 60.0 | 9550 | Pave | NaN | IR1 | Lvl | AllPub | Corner | Gtl | Crawfor | Norm | Norm | 1Fam | 2Story | 7 | 5 | 1915 | 1970 | Gable | CompShg | Wd Sdng | Wd Shng | None | 0.0 | TA | TA | BrkTil | TA | Gd | No | ALQ | 216 | Unf | 0 | 540 | 756 | GasA | Gd | Y | SBrkr | 961 | 756 | 0 | 1717 | 1 | 0 | 1 | 0 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Detchd | 1998.0 | Unf | 3 | 642 | TA | TA | Y | 0 | 35 | 272 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 2 | 2006 | WD | Abnorml | 140000 |
4 | 5 | 60 | RL | 84.0 | 14260 | Pave | NaN | IR1 | Lvl | AllPub | FR2 | Gtl | NoRidge | Norm | Norm | 1Fam | 2Story | 8 | 5 | 2000 | 2000 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 350.0 | Gd | TA | PConc | Gd | TA | Av | GLQ | 655 | Unf | 0 | 490 | 1145 | GasA | Ex | Y | SBrkr | 1145 | 1053 | 0 | 2198 | 1 | 0 | 2 | 1 | 4 | 1 | Gd | 9 | Typ | 1 | TA | Attchd | 2000.0 | RFn | 3 | 836 | TA | TA | Y | 192 | 84 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 12 | 2008 | WD | Normal | 250000 |
data_train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 1460 non-null int64
1 MSSubClass 1460 non-null int64
2 MSZoning 1460 non-null object
3 LotFrontage 1201 non-null float64
4 LotArea 1460 non-null int64
5 Street 1460 non-null object
6 Alley 91 non-null object
7 LotShape 1460 non-null object
8 LandContour 1460 non-null object
9 Utilities 1460 non-null object
10 LotConfig 1460 non-null object
11 LandSlope 1460 non-null object
12 Neighborhood 1460 non-null object
13 Condition1 1460 non-null object
14 Condition2 1460 non-null object
15 BldgType 1460 non-null object
16 HouseStyle 1460 non-null object
17 OverallQual 1460 non-null int64
18 OverallCond 1460 non-null int64
19 YearBuilt 1460 non-null int64
20 YearRemodAdd 1460 non-null int64
21 RoofStyle 1460 non-null object
22 RoofMatl 1460 non-null object
23 Exterior1st 1460 non-null object
24 Exterior2nd 1460 non-null object
25 MasVnrType 1452 non-null object
26 MasVnrArea 1452 non-null float64
27 ExterQual 1460 non-null object
28 ExterCond 1460 non-null object
29 Foundation 1460 non-null object
30 BsmtQual 1423 non-null object
31 BsmtCond 1423 non-null object
32 BsmtExposure 1422 non-null object
33 BsmtFinType1 1423 non-null object
34 BsmtFinSF1 1460 non-null int64
35 BsmtFinType2 1422 non-null object
36 BsmtFinSF2 1460 non-null int64
37 BsmtUnfSF 1460 non-null int64
38 TotalBsmtSF 1460 non-null int64
39 Heating 1460 non-null object
40 HeatingQC 1460 non-null object
41 CentralAir 1460 non-null object
42 Electrical 1459 non-null object
43 1stFlrSF 1460 non-null int64
44 2ndFlrSF 1460 non-null int64
45 LowQualFinSF 1460 non-null int64
46 GrLivArea 1460 non-null int64
47 BsmtFullBath 1460 non-null int64
48 BsmtHalfBath 1460 non-null int64
49 FullBath 1460 non-null int64
50 HalfBath 1460 non-null int64
51 BedroomAbvGr 1460 non-null int64
52 KitchenAbvGr 1460 non-null int64
53 KitchenQual 1460 non-null object
54 TotRmsAbvGrd 1460 non-null int64
55 Functional 1460 non-null object
56 Fireplaces 1460 non-null int64
57 FireplaceQu 770 non-null object
58 GarageType 1379 non-null object
59 GarageYrBlt 1379 non-null float64
60 GarageFinish 1379 non-null object
61 GarageCars 1460 non-null int64
62 GarageArea 1460 non-null int64
63 GarageQual 1379 non-null object
64 GarageCond 1379 non-null object
65 PavedDrive 1460 non-null object
66 WoodDeckSF 1460 non-null int64
67 OpenPorchSF 1460 non-null int64
68 EnclosedPorch 1460 non-null int64
69 3SsnPorch 1460 non-null int64
70 ScreenPorch 1460 non-null int64
71 PoolArea 1460 non-null int64
72 PoolQC 7 non-null object
73 Fence 281 non-null object
74 MiscFeature 54 non-null object
75 MiscVal 1460 non-null int64
76 MoSold 1460 non-null int64
77 YrSold 1460 non-null int64
78 SaleType 1460 non-null object
79 SaleCondition 1460 non-null object
80 SalePrice 1460 non-null int64
dtypes: float64(3), int64(35), object(43)
memory usage: 924.0+ KB
data_test.head()
Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1461 | 20 | RH | 80.0 | 11622 | Pave | NaN | Reg | Lvl | AllPub | Inside | Gtl | NAmes | Feedr | Norm | 1Fam | 1Story | 5 | 6 | 1961 | 1961 | Gable | CompShg | VinylSd | VinylSd | None | 0.0 | TA | TA | CBlock | TA | TA | No | Rec | 468.0 | LwQ | 144.0 | 270.0 | 882.0 | GasA | TA | Y | SBrkr | 896 | 0 | 0 | 896 | 0.0 | 0.0 | 1 | 0 | 2 | 1 | TA | 5 | Typ | 0 | NaN | Attchd | 1961.0 | Unf | 1.0 | 730.0 | TA | TA | Y | 140 | 0 | 0 | 0 | 120 | 0 | NaN | MnPrv | NaN | 0 | 6 | 2010 | WD | Normal |
1 | 1462 | 20 | RL | 81.0 | 14267 | Pave | NaN | IR1 | Lvl | AllPub | Corner | Gtl | NAmes | Norm | Norm | 1Fam | 1Story | 6 | 6 | 1958 | 1958 | Hip | CompShg | Wd Sdng | Wd Sdng | BrkFace | 108.0 | TA | TA | CBlock | TA | TA | No | ALQ | 923.0 | Unf | 0.0 | 406.0 | 1329.0 | GasA | TA | Y | SBrkr | 1329 | 0 | 0 | 1329 | 0.0 | 0.0 | 1 | 1 | 3 | 1 | Gd | 6 | Typ | 0 | NaN | Attchd | 1958.0 | Unf | 1.0 | 312.0 | TA | TA | Y | 393 | 36 | 0 | 0 | 0 | 0 | NaN | NaN | Gar2 | 12500 | 6 | 2010 | WD | Normal |
2 | 1463 | 60 | RL | 74.0 | 13830 | Pave | NaN | IR1 | Lvl | AllPub | Inside | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | 5 | 5 | 1997 | 1998 | Gable | CompShg | VinylSd | VinylSd | None | 0.0 | TA | TA | PConc | Gd | TA | No | GLQ | 791.0 | Unf | 0.0 | 137.0 | 928.0 | GasA | Gd | Y | SBrkr | 928 | 701 | 0 | 1629 | 0.0 | 0.0 | 2 | 1 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1997.0 | Fin | 2.0 | 482.0 | TA | TA | Y | 212 | 34 | 0 | 0 | 0 | 0 | NaN | MnPrv | NaN | 0 | 3 | 2010 | WD | Normal |
3 | 1464 | 60 | RL | 78.0 | 9978 | Pave | NaN | IR1 | Lvl | AllPub | Inside | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | 6 | 6 | 1998 | 1998 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 20.0 | TA | TA | PConc | TA | TA | No | GLQ | 602.0 | Unf | 0.0 | 324.0 | 926.0 | GasA | Ex | Y | SBrkr | 926 | 678 | 0 | 1604 | 0.0 | 0.0 | 2 | 1 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Attchd | 1998.0 | Fin | 2.0 | 470.0 | TA | TA | Y | 360 | 36 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 6 | 2010 | WD | Normal |
4 | 1465 | 120 | RL | 43.0 | 5005 | Pave | NaN | IR1 | HLS | AllPub | Inside | Gtl | StoneBr | Norm | Norm | TwnhsE | 1Story | 8 | 5 | 1992 | 1992 | Gable | CompShg | HdBoard | HdBoard | None | 0.0 | Gd | TA | PConc | Gd | TA | No | ALQ | 263.0 | Unf | 0.0 | 1017.0 | 1280.0 | GasA | Ex | Y | SBrkr | 1280 | 0 | 0 | 1280 | 0.0 | 0.0 | 2 | 0 | 2 | 1 | Gd | 5 | Typ | 0 | NaN | Attchd | 1992.0 | RFn | 2.0 | 506.0 | TA | TA | Y | 0 | 82 | 0 | 0 | 144 | 0 | NaN | NaN | NaN | 0 | 1 | 2010 | WD | Normal |
data_test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1459 entries, 0 to 1458
Data columns (total 80 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 1459 non-null int64
1 MSSubClass 1459 non-null int64
2 MSZoning 1455 non-null object
3 LotFrontage 1232 non-null float64
4 LotArea 1459 non-null int64
5 Street 1459 non-null object
6 Alley 107 non-null object
7 LotShape 1459 non-null object
8 LandContour 1459 non-null object
9 Utilities 1457 non-null object
10 LotConfig 1459 non-null object
11 LandSlope 1459 non-null object
12 Neighborhood 1459 non-null object
13 Condition1 1459 non-null object
14 Condition2 1459 non-null object
15 BldgType 1459 non-null object
16 HouseStyle 1459 non-null object
17 OverallQual 1459 non-null int64
18 OverallCond 1459 non-null int64
19 YearBuilt 1459 non-null int64
20 YearRemodAdd 1459 non-null int64
21 RoofStyle 1459 non-null object
22 RoofMatl 1459 non-null object
23 Exterior1st 1458 non-null object
24 Exterior2nd 1458 non-null object
25 MasVnrType 1443 non-null object
26 MasVnrArea 1444 non-null float64
27 ExterQual 1459 non-null object
28 ExterCond 1459 non-null object
29 Foundation 1459 non-null object
30 BsmtQual 1415 non-null object
31 BsmtCond 1414 non-null object
32 BsmtExposure 1415 non-null object
33 BsmtFinType1 1417 non-null object
34 BsmtFinSF1 1458 non-null float64
35 BsmtFinType2 1417 non-null object
36 BsmtFinSF2 1458 non-null float64
37 BsmtUnfSF 1458 non-null float64
38 TotalBsmtSF 1458 non-null float64
39 Heating 1459 non-null object
40 HeatingQC 1459 non-null object
41 CentralAir 1459 non-null object
42 Electrical 1459 non-null object
43 1stFlrSF 1459 non-null int64
44 2ndFlrSF 1459 non-null int64
45 LowQualFinSF 1459 non-null int64
46 GrLivArea 1459 non-null int64
47 BsmtFullBath 1457 non-null float64
48 BsmtHalfBath 1457 non-null float64
49 FullBath 1459 non-null int64
50 HalfBath 1459 non-null int64
51 BedroomAbvGr 1459 non-null int64
52 KitchenAbvGr 1459 non-null int64
53 KitchenQual 1458 non-null object
54 TotRmsAbvGrd 1459 non-null int64
55 Functional 1457 non-null object
56 Fireplaces 1459 non-null int64
57 FireplaceQu 729 non-null object
58 GarageType 1383 non-null object
59 GarageYrBlt 1381 non-null float64
60 GarageFinish 1381 non-null object
61 GarageCars 1458 non-null float64
62 GarageArea 1458 non-null float64
63 GarageQual 1381 non-null object
64 GarageCond 1381 non-null object
65 PavedDrive 1459 non-null object
66 WoodDeckSF 1459 non-null int64
67 OpenPorchSF 1459 non-null int64
68 EnclosedPorch 1459 non-null int64
69 3SsnPorch 1459 non-null int64
70 ScreenPorch 1459 non-null int64
71 PoolArea 1459 non-null int64
72 PoolQC 3 non-null object
73 Fence 290 non-null object
74 MiscFeature 51 non-null object
75 MiscVal 1459 non-null int64
76 MoSold 1459 non-null int64
77 YrSold 1459 non-null int64
78 SaleType 1458 non-null object
79 SaleCondition 1459 non-null object
dtypes: float64(11), int64(26), object(43)
memory usage: 912.0+ KB
data_train.describe()
Id | MSSubClass | LotFrontage | LotArea | OverallQual | OverallCond | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | TotRmsAbvGrd | Fireplaces | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | SalePrice | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 1460.000000 | 1460.000000 | 1201.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1452.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1379.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 |
mean | 730.500000 | 56.897260 | 70.049958 | 10516.828082 | 6.099315 | 5.575342 | 1971.267808 | 1984.865753 | 103.685262 | 443.639726 | 46.549315 | 567.240411 | 1057.429452 | 1162.626712 | 346.992466 | 5.844521 | 1515.463699 | 0.425342 | 0.057534 | 1.565068 | 0.382877 | 2.866438 | 1.046575 | 6.517808 | 0.613014 | 1978.506164 | 1.767123 | 472.980137 | 94.244521 | 46.660274 | 21.954110 | 3.409589 | 15.060959 | 2.758904 | 43.489041 | 6.321918 | 2007.815753 | 180921.195890 |
std | 421.610009 | 42.300571 | 24.284752 | 9981.264932 | 1.382997 | 1.112799 | 30.202904 | 20.645407 | 181.066207 | 456.098091 | 161.319273 | 441.866955 | 438.705324 | 386.587738 | 436.528436 | 48.623081 | 525.480383 | 0.518911 | 0.238753 | 0.550916 | 0.502885 | 0.815778 | 0.220338 | 1.625393 | 0.644666 | 24.689725 | 0.747315 | 213.804841 | 125.338794 | 66.256028 | 61.119149 | 29.317331 | 55.757415 | 40.177307 | 496.123024 | 2.703626 | 1.328095 | 79442.502883 |
min | 1.000000 | 20.000000 | 21.000000 | 1300.000000 | 1.000000 | 1.000000 | 1872.000000 | 1950.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 334.000000 | 0.000000 | 0.000000 | 334.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2.000000 | 0.000000 | 1900.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 2006.000000 | 34900.000000 |
25% | 365.750000 | 20.000000 | 59.000000 | 7553.500000 | 5.000000 | 5.000000 | 1954.000000 | 1967.000000 | 0.000000 | 0.000000 | 0.000000 | 223.000000 | 795.750000 | 882.000000 | 0.000000 | 0.000000 | 1129.500000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 2.000000 | 1.000000 | 5.000000 | 0.000000 | 1961.000000 | 1.000000 | 334.500000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5.000000 | 2007.000000 | 129975.000000 |
50% | 730.500000 | 50.000000 | 69.000000 | 9478.500000 | 6.000000 | 5.000000 | 1973.000000 | 1994.000000 | 0.000000 | 383.500000 | 0.000000 | 477.500000 | 991.500000 | 1087.000000 | 0.000000 | 0.000000 | 1464.000000 | 0.000000 | 0.000000 | 2.000000 | 0.000000 | 3.000000 | 1.000000 | 6.000000 | 1.000000 | 1980.000000 | 2.000000 | 480.000000 | 0.000000 | 25.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 6.000000 | 2008.000000 | 163000.000000 |
75% | 1095.250000 | 70.000000 | 80.000000 | 11601.500000 | 7.000000 | 6.000000 | 2000.000000 | 2004.000000 | 166.000000 | 712.250000 | 0.000000 | 808.000000 | 1298.250000 | 1391.250000 | 728.000000 | 0.000000 | 1776.750000 | 1.000000 | 0.000000 | 2.000000 | 1.000000 | 3.000000 | 1.000000 | 7.000000 | 1.000000 | 2002.000000 | 2.000000 | 576.000000 | 168.000000 | 68.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 8.000000 | 2009.000000 | 214000.000000 |
max | 1460.000000 | 190.000000 | 313.000000 | 215245.000000 | 10.000000 | 9.000000 | 2010.000000 | 2010.000000 | 1600.000000 | 5644.000000 | 1474.000000 | 2336.000000 | 6110.000000 | 4692.000000 | 2065.000000 | 572.000000 | 5642.000000 | 3.000000 | 2.000000 | 3.000000 | 2.000000 | 8.000000 | 3.000000 | 14.000000 | 3.000000 | 2010.000000 | 4.000000 | 1418.000000 | 857.000000 | 547.000000 | 552.000000 | 508.000000 | 480.000000 | 738.000000 | 15500.000000 | 12.000000 | 2010.000000 | 755000.000000 |
data_train.head()
Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 60 | RL | 65.0 | 8450 | Pave | NaN | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2003 | 2003 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 196.0 | Gd | TA | PConc | Gd | TA | No | GLQ | 706 | Unf | 0 | 150 | 856 | GasA | Ex | Y | SBrkr | 856 | 854 | 0 | 1710 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 8 | Typ | 0 | NaN | Attchd | 2003.0 | RFn | 2 | 548 | TA | TA | Y | 0 | 61 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 2 | 2008 | WD | Normal | 208500 |
1 | 2 | 20 | RL | 80.0 | 9600 | Pave | NaN | Reg | Lvl | AllPub | FR2 | Gtl | Veenker | Feedr | Norm | 1Fam | 1Story | 6 | 8 | 1976 | 1976 | Gable | CompShg | MetalSd | MetalSd | None | 0.0 | TA | TA | CBlock | Gd | TA | Gd | ALQ | 978 | Unf | 0 | 284 | 1262 | GasA | Ex | Y | SBrkr | 1262 | 0 | 0 | 1262 | 0 | 1 | 2 | 0 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1976.0 | RFn | 2 | 460 | TA | TA | Y | 298 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 5 | 2007 | WD | Normal | 181500 |
2 | 3 | 60 | RL | 68.0 | 11250 | Pave | NaN | IR1 | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2001 | 2002 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 162.0 | Gd | TA | PConc | Gd | TA | Mn | GLQ | 486 | Unf | 0 | 434 | 920 | GasA | Ex | Y | SBrkr | 920 | 866 | 0 | 1786 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 6 | Typ | 1 | TA | Attchd | 2001.0 | RFn | 2 | 608 | TA | TA | Y | 0 | 42 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 9 | 2008 | WD | Normal | 223500 |
3 | 4 | 70 | RL | 60.0 | 9550 | Pave | NaN | IR1 | Lvl | AllPub | Corner | Gtl | Crawfor | Norm | Norm | 1Fam | 2Story | 7 | 5 | 1915 | 1970 | Gable | CompShg | Wd Sdng | Wd Shng | None | 0.0 | TA | TA | BrkTil | TA | Gd | No | ALQ | 216 | Unf | 0 | 540 | 756 | GasA | Gd | Y | SBrkr | 961 | 756 | 0 | 1717 | 1 | 0 | 1 | 0 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Detchd | 1998.0 | Unf | 3 | 642 | TA | TA | Y | 0 | 35 | 272 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 2 | 2006 | WD | Abnorml | 140000 |
4 | 5 | 60 | RL | 84.0 | 14260 | Pave | NaN | IR1 | Lvl | AllPub | FR2 | Gtl | NoRidge | Norm | Norm | 1Fam | 2Story | 8 | 5 | 2000 | 2000 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 350.0 | Gd | TA | PConc | Gd | TA | Av | GLQ | 655 | Unf | 0 | 490 | 1145 | GasA | Ex | Y | SBrkr | 1145 | 1053 | 0 | 2198 | 1 | 0 | 2 | 1 | 4 | 1 | Gd | 9 | Typ | 1 | TA | Attchd | 2000.0 | RFn | 3 | 836 | TA | TA | Y | 192 | 84 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 12 | 2008 | WD | Normal | 250000 |
data_train.shape
(1460, 81)
data_test.shape
(1459, 80)
探索性数据分析(EDA)
数据缺失情况
# 缺失情况函数
def missing_data(data):
total = data.isnull().sum().sort_values(ascending = False)
percent = (data.isnull().sum()/data.isnull().count()*100).sort_values(ascending = False)
return pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
missing_data(data_train)
Total | Percent | |
---|---|---|
PoolQC | 1453 | 99.520548 |
MiscFeature | 1406 | 96.301370 |
Alley | 1369 | 93.767123 |
Fence | 1179 | 80.753425 |
FireplaceQu | 690 | 47.260274 |
LotFrontage | 259 | 17.739726 |
GarageCond | 81 | 5.547945 |
GarageType | 81 | 5.547945 |
GarageYrBlt | 81 | 5.547945 |
GarageFinish | 81 | 5.547945 |
GarageQual | 81 | 5.547945 |
BsmtExposure | 38 | 2.602740 |
BsmtFinType2 | 38 | 2.602740 |
BsmtFinType1 | 37 | 2.534247 |
BsmtCond | 37 | 2.534247 |
BsmtQual | 37 | 2.534247 |
MasVnrArea | 8 | 0.547945 |
MasVnrType | 8 | 0.547945 |
Electrical | 1 | 0.068493 |
Utilities | 0 | 0.000000 |
YearRemodAdd | 0 | 0.000000 |
MSSubClass | 0 | 0.000000 |
Foundation | 0 | 0.000000 |
ExterCond | 0 | 0.000000 |
ExterQual | 0 | 0.000000 |
Exterior2nd | 0 | 0.000000 |
Exterior1st | 0 | 0.000000 |
RoofMatl | 0 | 0.000000 |
RoofStyle | 0 | 0.000000 |
YearBuilt | 0 | 0.000000 |
LotConfig | 0 | 0.000000 |
OverallCond | 0 | 0.000000 |
OverallQual | 0 | 0.000000 |
HouseStyle | 0 | 0.000000 |
BldgType | 0 | 0.000000 |
Condition2 | 0 | 0.000000 |
BsmtFinSF1 | 0 | 0.000000 |
MSZoning | 0 | 0.000000 |
LotArea | 0 | 0.000000 |
Street | 0 | 0.000000 |
Condition1 | 0 | 0.000000 |
Neighborhood | 0 | 0.000000 |
LotShape | 0 | 0.000000 |
LandContour | 0 | 0.000000 |
LandSlope | 0 | 0.000000 |
SalePrice | 0 | 0.000000 |
HeatingQC | 0 | 0.000000 |
BsmtFinSF2 | 0 | 0.000000 |
EnclosedPorch | 0 | 0.000000 |
Fireplaces | 0 | 0.000000 |
GarageCars | 0 | 0.000000 |
GarageArea | 0 | 0.000000 |
PavedDrive | 0 | 0.000000 |
WoodDeckSF | 0 | 0.000000 |
OpenPorchSF | 0 | 0.000000 |
3SsnPorch | 0 | 0.000000 |
BsmtUnfSF | 0 | 0.000000 |
ScreenPorch | 0 | 0.000000 |
PoolArea | 0 | 0.000000 |
MiscVal | 0 | 0.000000 |
MoSold | 0 | 0.000000 |
YrSold | 0 | 0.000000 |
SaleType | 0 | 0.000000 |
Functional | 0 | 0.000000 |
TotRmsAbvGrd | 0 | 0.000000 |
KitchenQual | 0 | 0.000000 |
KitchenAbvGr | 0 | 0.000000 |
BedroomAbvGr | 0 | 0.000000 |
HalfBath | 0 | 0.000000 |
FullBath | 0 | 0.000000 |
BsmtHalfBath | 0 | 0.000000 |
BsmtFullBath | 0 | 0.000000 |
GrLivArea | 0 | 0.000000 |
LowQualFinSF | 0 | 0.000000 |
2ndFlrSF | 0 | 0.000000 |
1stFlrSF | 0 | 0.000000 |
CentralAir | 0 | 0.000000 |
SaleCondition | 0 | 0.000000 |
Heating | 0 | 0.000000 |
TotalBsmtSF | 0 | 0.000000 |
Id | 0 | 0.000000 |
探索特征
# 离散数据
def lisan_plot(column, data):
fig = plt.figure(figsize=(10,4))
plt.subplot2grid((1,2),(0,0))
sns.barplot(x=data[column].value_counts().index, y=data[column].value_counts().values)
plt.title(column)
plt.ylabel('数量')
plt.subplot2grid((1,2),(0,1))
sns.boxplot(x=column, y='SalePrice', data=data)
# 连续数据
def lianxu_plot(column, data):
fig = plt.figure(figsize=(10,4))
plt.subplot2grid((1,2),(0,0))
sns.distplot(data[column].dropna())
plt.xlabel(column)
plt.ylabel('数量')
plt.subplot2grid((1,2),(0,1))
sns.scatterplot(data[column].dropna(), data['SalePrice'])
plt.show()
1.MSSubClass:
Identifies the type of dwelling involved in the sale.标识出售中涉及的住宅类型。
column = 'MSSubClass'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
15
20 536
60 299
50 144
120 87
30 69
160 63
70 60
80 58
90 52
190 30
85 20
75 16
45 12
180 10
40 4
Name: MSSubClass, dtype: int64
2.MSZoning:
Identifies the general zoning classification of the sale.确定销售的一般分区分类。
column = 'MSZoning'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
RL 1151
RM 218
FV 65
RH 16
C (all) 10
Name: MSZoning, dtype: int64
3.LotFrontage:
Linear feet of street connected to property临街:与物业相连的线性的几英尺的街道
column = 'LotFrontage'
print(len(data_train[column].unique()))
print('最大值和最小值:',data_train[column].max(), data_train[column].min())
print(data_train[column].unique())
lianxu_plot(column, data_train)
111
最大值和最小值: 313.0 21.0
[ 65. 80. 68. 60. 84. 85. 75. nan 51. 50. 70. 91. 72. 66.
101. 57. 44. 110. 98. 47. 108. 112. 74. 115. 61. 48. 33. 52.
100. 24. 89. 63. 76. 81. 95. 69. 21. 32. 78. 121. 122. 40.
105. 73. 77. 64. 94. 34. 90. 55. 88. 82. 71. 120. 107. 92.
134. 62. 86. 141. 97. 54. 41. 79. 174. 99. 67. 83. 43. 103.
93. 30. 129. 140. 35. 37. 118. 87. 116. 150. 111. 49. 96. 59.
36. 56. 102. 58. 38. 109. 130. 53. 137. 45. 106. 104. 42. 39.
144. 114. 128. 149. 313. 168. 182. 138. 160. 152. 124. 153. 46.]
可能是异常值:data[data[‘LotFrontage’] > 300]
4.LotArea:
Lot size in square feet地块面积(平方英尺)
column = 'LotArea'
print(len(data_train[column].unique()))
print('最大值和最小值:',data_train[column].max(), data_train[column].min())
print(data_train[column].unique())
lianxu_plot(column, data_train)
1073
最大值和最小值: 215245 1300
[ 8450 9600 11250 ... 17217 13175 9717]
可能是异常值:data[data[‘LotArea’] > 100000]
5.Street:
Type of road access to property街道:进入物业的道路类型
column = 'Street'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
2
Pave 1454
Grvl 6
Name: Street, dtype: int64
6.Alley:
Type of alley access to property小巷:通向财产的小巷的类型
column = 'Alley'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
3
Grvl 50
Pave 41
Name: Alley, dtype: int64
7.LotShape:
General shape of property一般形状
Reg Regular 常规的
IR1 Slightly irregular 轻微的不规则
IR2 Moderately Irregular 适度的不规则
IR3 Irregular不规则
column = 'LotShape'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
Reg 925
IR1 484
IR2 41
IR3 10
Name: LotShape, dtype: int64
8.LandContour:
Flatness of the property 平坦程度
Lvl Near Flat/Level */水平
Bnk Banked - Quick and significant rise from street grade to building 有坡面的-快速而显著地从街道等级上升到建筑等级
HLS Hillside - Significant slope from side to side山坡-显著的从一边到另一边的斜坡
Low Depression洼地;凹地
column = 'LandContour'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
Lvl 1311
Bnk 63
HLS 50
Low 36
Name: LandContour, dtype: int64
9.Utilities:
Type of utilities available可用的公共设备类型
AllPub All public Utilities (E,G,W,& S) 所有公用事业(如,G,W, S)
NoSewr Electricity, Gas, and Water (Septic Tank) 电、气、水(化粪池)
NoSeWa Electricity and Gas Only只提供电力及煤气
ELO Electricity only
column = 'Utilities'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
2
AllPub 1459
NoSeWa 1
Name: Utilities, dtype: int64
10.LotConfig:
Lot configuration 批量配置
Inside Inside lot里面
Corner Corner lot角落
CulDSac Cul-de-sac死胡同
FR2 Frontage on 2 sides of property房屋两面的正面
FR3 Frontage on 3 sides of property三面房屋的正面
column = 'LotConfig'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
Inside 1052
Corner 263
CulDSac 94
FR2 47
FR3 4
Name: LotConfig, dtype: int64
11.LandSlope:
Slope of property斜坡
Gtl Gentle slope缓坡
Mod Moderate Slope 温和的斜坡
Sev Severe Slope严重的斜坡
column = 'LandSlope'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
3
Gtl 1382
Mod 65
Sev 13
Name: LandSlope, dtype: int64
12.Neighborhood:
Physical locations within Ames city limits邻居:在艾姆斯城市范围内的物理位置
Blmngtn Bloomington Heights
Blueste Bluestem
BrDale Briardale
BrkSide *side
ClearCr Clear Creek
CollgCr College Creek
Crawfor Crawford
Edwards Edwards
Gilbert Gilbert
IDOTRR Iowa DOT and Rail Road
MeadowV Meadow Village
Mitchel Mitchell
Names North Ames
NoRidge Northridge
NPkVill Northpark Villa
NridgHt Northridge Heights
NWAmes Northwest Ames
OldTown Old Town
SWISU South & West of Iowa State University
Sawyer Sawyer
SawyerW Sawyer West
Somerst Somerset
StoneBr Stone *
Timber Timberland
Veenker Veenker
column = 'Neighborhood'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
25
NAmes 225
CollgCr 150
OldTown 113
Edwards 100
Somerst 86
Gilbert 79
NridgHt 77
Sawyer 74
NWAmes 73
SawyerW 59
BrkSide 58
Crawfor 51
Mitchel 49
NoRidge 41
Timber 38
IDOTRR 37
ClearCr 28
StoneBr 25
SWISU 25
Blmngtn 17
MeadowV 17
BrDale 16
Veenker 11
NPkVill 9
Blueste 2
Name: Neighborhood, dtype: int64
13.Condition1:
Proximity to various conditions接近各种条件
Artery Adjacent to arterial street毗邻主干道
Feedr Adjacent to feeder street毗邻支线街
Norm Normal
RRNn Within 200' of North-South Railroad距离南北铁路200英尺以内
RRAn Adjacent to North-South Railroad紧邻南北铁路
PosN Near positive off-site feature--park, greenbelt, etc.近正场外特征——公园、绿地等。
PosA Adjacent to postive off-site feature与非现场特征相邻
RRNe Within 200' of East-West Railroad距离东西铁路200英尺的地方
RRAe Adjacent to East-West Railroad毗邻东西铁路
column = 'Condition1'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
9
Norm 1260
Feedr 81
Artery 48
RRAn 26
PosN 19
RRAe 11
PosA 8
RRNn 5
RRNe 2
Name: Condition1, dtype: int64
14.Condition2:
Proximity to various conditions (if more than one is present)接近各种条件(如果存在多于一个)
Artery Adjacent to arterial street
Feedr Adjacent to feeder street
Norm Normal
RRNn Within 200' of North-South Railroad
RRAn Adjacent to North-South Railroad
PosN Near positive off-site feature--park, greenbelt, etc.
PosA Adjacent to postive off-site feature
RRNe Within 200' of East-West Railroad
RRAe Adjacent to East-West Railroad
column = 'Condition2'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
8
Norm 1445
Feedr 6
Artery 2
PosN 2
RRNn 2
PosA 1
RRAn 1
RRAe 1
Name: Condition2, dtype: int64
15.BldgType:
Type of dwelling住宅类型
1Fam Single-family Detached 独栋独立式
2FmCon Two-family Conversion; originally built as one-family dwelling两家合住的转换;最初是作为一户住宅建造的
Duplx Duplex双工
TwnhsE Townhouse End Unit联排别墅结束单元
TwnhsI Townhouse Inside Unit联排别墅内部单位
column = 'BldgType'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
1Fam 1220
TwnhsE 114
Duplex 52
Twnhs 43
2fmCon 31
Name: BldgType, dtype: int64
16.HouseStyle:
Style of dwelling住宅风格
1Story One story
1.5Fin One and one-half story: 2nd level finished
1.5Unf One and one-half story: 2nd level unfinished
2Story Two story
2.5Fin Two and one-half story: 2nd level finished
2.5Unf Two and one-half story: 2nd level unfinished
SFoyer Split Foyer
SLvl Split Level
column = 'HouseStyle'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
8
1Story 726
2Story 445
1.5Fin 154
SLvl 65
SFoyer 37
1.5Unf 14
2.5Unf 11
2.5Fin 8
Name: HouseStyle, dtype: int64
17.OverallQual:
Rates the overall material and finish of the house
总体质量:评估房屋的整体材料和装饰
10 Very Excellent
9 Excellent
8 Very Good
7 Good
6 Above Average
5 Average
4 Below Average
3 Fair
2 Poor
1 Very Poor
column = 'OverallQual'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
10
5 397
6 374
7 319
8 168
4 116
9 43
3 20
10 18
2 3
1 2
Name: OverallQual, dtype: int64
18.OverallCond:
Rates the overall condition of the house对房子的整体状况进行评估
10 Very Excellent
9 Excellent
8 Very Good
7 Good
6 Above Average
5 Average
4 Below Average
3 Fair
2 Poor
1 Very Poor
column = 'OverallCond'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
9
5 821
6 252
7 205
8 72
4 57
3 25
9 22
2 5
1 1
Name: OverallCond, dtype: int64
19.YearBuilt:
Original construction date原始施工日期
column = 'YearBuilt'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
112
2006 67
2005 64
2004 54
2007 49
2003 45
1976 33
1977 32
1920 30
1959 26
1999 25
1998 25
1958 24
1965 24
1970 24
1954 24
2000 24
2002 23
2008 23
1972 23
1968 22
1971 22
1950 20
2001 20
1957 20
1962 19
1994 19
1966 18
2009 18
1995 18
1940 18
1910 17
1960 17
1993 17
1978 16
1955 16
1925 16
1963 16
1967 16
1996 15
1941 15
1964 15
1969 14
1956 14
1961 14
1997 14
1948 14
1992 13
1990 12
1953 12
1949 12
1988 11
1973 11
1915 10
1900 10
1980 10
1974 10
1979 9
1926 9
1930 9
1936 9
1984 9
1939 8
1922 8
1975 8
1916 8
1924 7
1928 7
1918 7
1914 7
1923 7
1946 7
1935 6
1945 6
1931 6
1982 6
1921 6
1951 6
1985 5
1937 5
1947 5
1991 5
1981 5
1986 5
1952 5
1880 4
1929 4
1932 4
1938 4
1983 4
1927 3
1919 3
1934 3
1989 3
1987 3
1912 3
1885 2
1892 2
1890 2
1942 2
1908 2
1882 1
1875 1
1893 1
2010 1
1898 1
1904 1
1905 1
1906 1
1911 1
1913 1
1917 1
1872 1
Name: YearBuilt, dtype: int64
column = 'YearBuilt'
lianxu_plot(column, data_train)
20.YearRemodAdd:
Remodel date (same as construction date if no remodeling or additions)
改型日期(如无改型或加建,则与建造日期相同)
column = 'YearRemodAdd'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
61
1950 178
2006 97
2007 76
2005 73
2004 62
2000 55
2003 51
2002 48
2008 40
1996 36
1998 36
1995 31
1976 30
1999 30
1970 26
1997 25
1977 25
2009 23
1994 22
2001 21
1972 20
1965 19
1993 19
1971 18
1959 18
1968 17
1992 17
1978 16
1966 15
1958 15
1990 15
1962 14
1954 14
1969 14
1991 14
1963 13
1960 12
1967 12
1980 12
1973 11
1964 11
1989 11
1987 10
1975 10
1979 10
1956 10
1953 10
1957 9
1988 9
1955 9
1985 9
1961 8
1981 8
1974 7
1982 7
1984 7
2010 6
1983 5
1952 5
1986 5
1951 4
Name: YearRemodAdd, dtype: int64
column = 'YearRemodAdd'
lianxu_plot(column, data_train)
21.RoofStyle:
Type of roof屋顶类型
Flat Flat
Gable Gable
Gambrel Gabrel (Barn)
Hip Hip
Mansard Mansard
Shed Shed
column = 'RoofStyle'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
Gable 1141
Hip 286
Flat 13
Gambrel 11
Mansard 7
Shed 2
Name: RoofStyle, dtype: int64
22.RoofMatl:
Roof material屋顶材料
ClyTile Clay or Tile
CompShg Standard (Composite) Shingle
Membran Membrane
Metal Metal
Roll Roll
Tar&Grv Gravel & Tar
WdShake Wood Shakes
WdShngl Wood Shingles
column = 'RoofMatl'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
8
CompShg 1434
Tar&Grv 11
WdShngl 6
WdShake 5
Metal 1
Roll 1
Membran 1
ClyTile 1
Name: RoofMatl, dtype: int64
23.Exterior1st:
Exterior covering on house房屋外盖
AsbShng Asbestos Shingles
AsphShn Asphalt Shingles沥青瓦
BrkComm Brick Common
BrkFace Brick Face砖面
CBlock Cinder Block煤渣砖
CemntBd Cement Board
HdBoard Hard Board
ImStucc Imitation Stucco
MetalSd Metal Siding
Other Other
Plywood Plywood
PreCast PreCast
Stone Stone
Stucco Stucco
VinylSd Vinyl Siding
Wd Sdng Wood Siding
WdShing Wood Shingles
column = 'Exterior1st'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
15
VinylSd 515
HdBoard 222
MetalSd 220
Wd Sdng 206
Plywood 108
CemntBd 61
BrkFace 50
WdShing 26
Stucco 25
AsbShng 20
BrkComm 2
Stone 2
AsphShn 1
CBlock 1
ImStucc 1
Name: Exterior1st, dtype: int64
24.Exterior2nd:
Exterior covering on house (if more than one material)
房屋外部覆盖物(如果多于一种材料)
AsbShng Asbestos Shingles
AsphShn Asphalt Shingles
BrkComm Brick Common
BrkFace Brick Face
CBlock Cinder Block
CemntBd Cement Board
HdBoard Hard Board
ImStucc Imitation Stucco
MetalSd Metal Siding
Other Other
Plywood Plywood
PreCast PreCast
Stone Stone
Stucco Stucco
VinylSd Vinyl Siding
Wd Sdng Wood Siding
WdShing Wood Shingles
column = 'Exterior2nd'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
16
VinylSd 504
MetalSd 214
HdBoard 207
Wd Sdng 197
Plywood 142
CmentBd 60
Wd Shng 38
Stucco 26
BrkFace 25
AsbShng 20
ImStucc 10
Brk Cmn 7
Stone 5
AsphShn 3
CBlock 1
Other 1
Name: Exterior2nd, dtype: int64
25.MasVnrType:
Masonry veneer type表层砌体类型
BrkCmn Brick Common
BrkFace Brick Face
CBlock Cinder Block
None None
Stone Stone
column = 'MasVnrType'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
None 864
BrkFace 445
Stone 128
BrkCmn 15
Name: MasVnrType, dtype: int64
26.MasVnrArea:
Masonry veneer area in square feet砌体贴面面积,平方英尺
column = 'MasVnrArea'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
328
最小值和最大值: 0.0 1600.0
27.ExterQual:
Evaluates the quality of the material on the exterior
评估外部材料的质量
Ex Excellent
Gd Good
TA Average/Typical
Fa Fair
Po Poor
column = 'ExterQual'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
TA 906
Gd 488
Ex 52
Fa 14
Name: ExterQual, dtype: int64
28.ExterCond:
Evaluates the present condition of the material on the exterior评估外部材料的现状
Ex Excellent
Gd Good
TA Average/Typical
Fa Fair
Po Poor
column = 'ExterCond'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
TA 1282
Gd 146
Fa 28
Ex 3
Po 1
Name: ExterCond, dtype: int64
29.Foundation:
Type of foundation基础的类型
BrkTil Brick & Tile砖和瓦
CBlock Cinder Block煤渣砖
PConc Poured Contrete
Slab Slab
Stone Stone
Wood Wood
column = 'Foundation'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
PConc 647
CBlock 634
BrkTil 146
Slab 24
Stone 6
Wood 3
Name: Foundation, dtype: int64
30.BsmtQual:
Evaluates the height of the basement.评估地下室的高度
Ex Excellent (100+ inches)
Gd Good (90-99 inches)
TA Typical (80-89 inches)
Fa Fair (70-79 inches)
Po Poor (<70 inches
NA No Basement
column = 'BsmtQual'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
TA 649
Gd 618
Ex 121
Fa 35
Name: BsmtQual, dtype: int64
31.BsmtCond:
Evaluates the general condition of the basement
评估地下室的总体状况
Ex Excellent
Gd Good
TA Typical - slight dampness allowed
Fa Fair - dampness or some cracking or settling
Po Poor - Severe cracking, settling, or wetness
NA No Basement
column = 'BsmtCond'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
TA 1311
Gd 65
Fa 45
Po 2
Name: BsmtCond, dtype: int64
32.BsmtExposure:
Refers to walkout or garden level walls
指罢工的或花园水平的墙
Gd Good Exposure
Av Average Exposure (split levels or foyers typically score average or above)
Mn Mimimum Exposure
No No Exposure
NA No Basement
column = 'BsmtExposure'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
No 953
Av 221
Gd 134
Mn 114
Name: BsmtExposure, dtype: int64
33.BsmtFinType1:
Rating of basement finished area地下室完工面积等级
GLQ Good Living Quarters
ALQ Average Living Quarters
BLQ Below Average Living Quarters
Rec Average Rec Room
LwQ Low Quality
Unf Unfinshed
NA No Basement
column = 'BsmtFinType1'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
7
Unf 430
GLQ 418
ALQ 220
BLQ 148
Rec 133
LwQ 74
Name: BsmtFinType1, dtype: int64
34.BsmtFinSF1:
Type 1 finished square feet一型成品平方英尺
column = 'BsmtFinSF1'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
637
最小值和最大值: 0 5644
异常值: >5000
35.BsmtFinType2:
Rating of basement finished area (if multiple types)
地下室完工面积等级(如多类型)
GLQ Good Living Quarters
ALQ Average Living Quarters
BLQ Below Average Living Quarters
Rec Average Rec Room
LwQ Low Quality
Unf Unfinshed
NA No Basement
column = 'BsmtFinType2'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
7
Unf 1256
Rec 54
LwQ 46
BLQ 33
ALQ 19
GLQ 14
Name: BsmtFinType2, dtype: int64
36.BsmtFinSF2:
Type 2 finished square feet
2型完成平方英尺
column = 'BsmtFinSF2'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
144
最小值和最大值: 0 1474
37.BsmtUnfSF:
Unfinished square feet of basement area
未完成的地下室面积
column = 'BsmtUnfSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
780
最小值和最大值: 0 2336
38.TotalBsmtSF:
Total square feet of basement area
地下室总面积
column = 'TotalBsmtSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
721
最小值和最大值: 0 6110
异常值: >5000
39.Heating:
Type of heating加热方式
Floor Floor Furnace
GasA Gas forced warm air furnace
GasW Gas hot water or steam heat
Grav Gravity furnace
OthW Hot water or steam heat other than gas
Wall Wall furnace
column = 'Heating'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
GasA 1428
GasW 18
Grav 7
Wall 4
OthW 2
Floor 1
Name: Heating, dtype: int64
40.HeatingQC:
Heating quality and condition加热质量和条件
Ex Excellent
Gd Good
TA Average/Typical
Fa Fair
Po Poor
column = 'HeatingQC'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
Ex 741
TA 428
Gd 241
Fa 49
Po 1
Name: HeatingQC, dtype: int64
41.CentralAir:
Central air conditioning*空调
N No
Y Yes
column = 'CentralAir'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
2
Y 1365
N 95
Name: CentralAir, dtype: int64
42.Electrical:
Electrical system电气系统
SBrkr Standard Circuit Breakers & Romex
FuseA Fuse Box over 60 AMP and all Romex wiring (Average)
FuseF 60 AMP Fuse Box and mostly Romex wiring (Fair)
FuseP 60 AMP Fuse Box and mostly knob & tube wiring (poor)
Mix Mixed
column = 'Electrical'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
SBrkr 1334
FuseA 94
FuseF 27
FuseP 3
Mix 1
Name: Electrical, dtype: int64
43.1stFlrSF:
First Floor square feet一楼平方英尺
column = '1stFlrSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
753
最小值和最大值: 334 4692
可能的异常值: >4000
44.2ndFlrSF:
Second floor square feet二楼平方英尺
column = '2ndFlrSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
417
最小值和最大值: 0 2065
45.LowQualFinSF:
Low quality finished square feet (all floors)低质量完工面积(所有楼层)
column = 'LowQualFinSF'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
24
0 1434
80 3
360 2
528 1
53 1
120 1
144 1
156 1
205 1
232 1
234 1
371 1
572 1
390 1
392 1
397 1
420 1
473 1
479 1
481 1
513 1
514 1
515 1
384 1
Name: LowQualFinSF, dtype: int64
column = 'LowQualFinSF'
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
最小值和最大值: 0 572
46.GrLivArea:
Above grade (ground) living area square feet以上(地面)居住面积平方英尺
column = 'GrLivArea'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
861
最小值和最大值: 334 5642
47.BsmtFullBath:
Basement full bathrooms地下室全浴室
column = 'BsmtFullBath'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
0 856
1 588
2 15
3 1
Name: BsmtFullBath, dtype: int64
48.BsmtHalfBath:
Basement half bathrooms半地下室卫生间
column = 'BsmtHalfBath'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
3
0 1378
1 80
2 2
Name: BsmtHalfBath, dtype: int64
49.FullBath:
Full bathrooms above grade
column = 'FullBath'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
2 768
1 650
3 33
0 9
Name: FullBath, dtype: int64
50.HalfBath:
Half baths above grade
column = 'HalfBath'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
3
0 913
1 535
2 12
Name: HalfBath, dtype: int64
51.BedroomAbvGr:
Bedrooms above grade (does NOT include basement bedrooms)
楼上卧室(不包括地下室卧室)
column = 'BedroomAbvGr'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
8
3 804
2 358
4 213
1 50
5 21
6 7
0 6
8 1
Name: BedroomAbvGr, dtype: int64
52.KitchenAbvGr:
Kitchens above grade
column = 'KitchenAbvGr'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
1 1392
2 65
3 2
0 1
Name: KitchenAbvGr, dtype: int64
53.KitchenQual:
Kitchen quality
Ex Excellent
Gd Good
TA Typical/Average
Fa Fair
Po Poor
column = 'KitchenQual'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
TA 735
Gd 586
Ex 100
Fa 39
Name: KitchenQual, dtype: int64
54.TotRmsAbvGrd:
Total rooms above grade (does not include bathrooms)
以上楼层客房总数(不含浴室)
column = 'TotRmsAbvGrd'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
12
6 402
7 329
5 275
8 187
4 97
9 75
10 47
11 18
3 17
12 11
14 1
2 1
Name: TotRmsAbvGrd, dtype: int64
55.Functional:
Home functionality (Assume typical unless deductions are warranted)家庭功能(假设是典型的,除非有必要进行扣减)
Typ Typical Functionality
Min1 Minor Deductions 1 小扣除1
Min2 Minor Deductions 2
Mod Moderate Deductions温和的扣除
Maj1 Major Deductions 1
Maj2 Major Deductions 2
Sev Severely Damaged严重受损
Sal Salvage only
column = 'Functional'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
7
Typ 1360
Min2 34
Min1 31
Mod 15
Maj1 14
Maj2 5
Sev 1
Name: Functional, dtype: int64
56.Fireplaces:
Number of fireplaces壁炉的数目
column = 'Fireplaces'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
0 690
1 650
2 115
3 5
Name: Fireplaces, dtype: int64
57.FireplaceQu:
Fireplace quality壁炉质量
Ex Excellent - Exceptional Masonry Fireplace
Gd Good - Masonry Fireplace in main level
TA Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement
Fa Fair - Prefabricated Fireplace in basement
Po Poor - Ben Franklin Stove
NA No Fireplace
column = 'FireplaceQu'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
Gd 380
TA 313
Fa 33
Ex 24
Po 20
Name: FireplaceQu, dtype: int64
58.GarageType:
Garage location车库位置
2Types More than one type of garage
Attchd Attached to home附加到家里
Basment Basement Garage地下室车库
BuiltIn Built-In (Garage part of house - typically has room above garage)
CarPort Car Port
Detchd Detached from home
NA No Garage
column = 'GarageType'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
7
Attchd 870
Detchd 387
BuiltIn 88
Basment 19
CarPort 9
2Types 6
Name: GarageType, dtype: int64
59.GarageYrBlt:
Year garage was built
column = 'GarageYrBlt'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
98
2005.0 65
2006.0 59
2004.0 53
2003.0 50
2007.0 49
1977.0 35
1998.0 31
1999.0 30
2008.0 29
1976.0 29
2000.0 27
2002.0 26
1968.0 26
1950.0 24
1993.0 22
2009.0 21
1965.0 21
1966.0 21
1962.0 21
1958.0 21
2001.0 20
1996.0 20
1957.0 20
1970.0 20
1960.0 19
1997.0 19
1978.0 19
1954.0 19
1974.0 18
1994.0 18
1995.0 18
1964.0 18
1959.0 17
1963.0 16
1990.0 16
1956.0 16
1969.0 15
1979.0 15
1980.0 15
1967.0 15
1988.0 14
1973.0 14
1940.0 14
1920.0 14
1972.0 14
1961.0 13
1971.0 13
1955.0 13
1992.0 13
1953.0 12
1987.0 11
1948.0 11
1985.0 10
1981.0 10
1941.0 10
1925.0 10
1989.0 10
1975.0 9
1991.0 9
1939.0 9
1984.0 8
1949.0 8
1930.0 8
1983.0 7
1986.0 6
1951.0 6
1926.0 6
1922.0 5
1936.0 5
1916.0 5
1931.0 4
1945.0 4
1935.0 4
1928.0 4
1946.0 4
1982.0 4
1938.0 3
1921.0 3
1924.0 3
1910.0 3
1952.0 3
1932.0 3
2010.0 3
1923.0 3
1937.0 2
1934.0 2
1918.0 2
1947.0 2
1929.0 2
1914.0 2
1915.0 2
1942.0 2
1908.0 1
1927.0 1
1933.0 1
1900.0 1
1906.0 1
Name: GarageYrBlt, dtype: int64
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
最小值和最大值: 1900.0 2010.0
60.GarageFinish:
Interior finish of the garage车库的内部装修
Fin Finished
RFn Rough Finished
Unf Unfinished
NA No Garage
column = 'GarageFinish'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
Unf 605
RFn 422
Fin 352
Name: GarageFinish, dtype: int64
61.GarageCars:
Size of garage in car capacity车库的容量
column = 'GarageCars'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
2 824
1 369
3 181
0 81
4 5
Name: GarageCars, dtype: int64
62.GarageArea:
Size of garage in square feet车库面积(平方英尺)
column = 'GarageArea'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
441
最小值和最大值: 0 1418
63.GarageQual:
Garage quality车库质量
Ex Excellent
Gd Good
TA Typical/Average
Fa Fair
Po Poor
NA No Garage
column = 'GarageQual'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
TA 1311
Fa 48
Gd 14
Po 3
Ex 3
Name: GarageQual, dtype: int64
64.GarageCond:
Garage condition车库条件
Ex Excellent
Gd Good
TA Typical/Average
Fa Fair
Po Poor
NA No Garage
column = 'GarageCond'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
TA 1326
Fa 35
Gd 9
Po 7
Ex 2
Name: GarageCond, dtype: int64
65.PavedDrive:
Paved driveway 铺设车道
Y Paved
P Partial Pavement
N Dirt/Gravel
column = 'PavedDrive'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
3
Y 1340
N 90
P 30
Name: PavedDrive, dtype: int64
66.WoodDeckSF:
Wood deck area in square feet
木甲板面积,平方英尺
column = 'WoodDeckSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
274
最小值和最大值: 0 857
67.OpenPorchSF:
Open porch area in square feet
开放式门廊面积,平方英尺
column = 'OpenPorchSF'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
202
最小值和最大值: 0 547
68.EnclosedPorch:
Enclosed porch area in square feet
封闭门廊面积,平方英尺
column = 'EnclosedPorch'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
120
最小值和最大值: 0 552
69.3SsnPorch:
Three season porch area in square feet
三季门廊面积,平方英尺
column = '3SsnPorch'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
20
最小值和最大值: 0 508
70.ScreenPorch:
Screen porch area in square feet
屏风门廊面积,平方英尺
column = 'ScreenPorch'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
76
最小值和最大值: 0 480
71.PoolArea:
Pool area in square feet
游泳池面积,单位为平方英尺
column = 'PoolArea'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
8
最小值和最大值: 0 738
72.PoolQC:
Pool quality池质量
Ex Excellent
Gd Good
TA Average/Typical
Fa Fair
NA No Pool
column = 'PoolQC'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
4
Gd 3
Fa 2
Ex 2
Name: PoolQC, dtype: int64
73.Fence:
Fence quality栅栏质量
GdPrv Good Privacy良好的隐私
MnPrv Minimum Privacy
GdWo Good Wood
MnWw Minimum Wood/Wire
NA No Fence
column = 'Fence'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
MnPrv 157
GdPrv 59
GdWo 54
MnWw 11
Name: Fence, dtype: int64
74.MiscFeature:
Miscellaneous feature not covered in other categories其他类别未包括的杂项特性
Elev Elevator电梯
Gar2 2nd Garage (if not described in garage section)第二车库(如果在车库部分没有描述)
Othr Other
Shed Shed (over 100 SF)小屋(100平方英尺以上)
TenC Tennis Court网球场
NA None
column = 'MiscFeature'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
Shed 49
Othr 2
Gar2 2
TenC 1
Name: MiscFeature, dtype: int64
75.MiscVal:
$Value of miscellaneous feature $杂项功能的价值
column = 'MiscVal'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
21
0 1408
400 11
500 8
700 5
450 4
2000 4
600 4
1200 2
480 2
1150 1
800 1
15500 1
620 1
3500 1
560 1
2500 1
1300 1
1400 1
350 1
8300 1
54 1
Name: MiscVal, dtype: int64
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
最小值和最大值: 0 15500
76.MoSold:
Month Sold (MM)售出月份(MM)
column = 'MoSold'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
12
6 253
7 234
5 204
4 141
8 122
3 106
10 89
11 79
9 63
12 59
1 58
2 52
Name: MoSold, dtype: int64
77.YrSold:
Year Sold (YYYY)售出年(年)
column = 'YrSold'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
5
2009 338
2007 329
2006 314
2008 304
2010 175
Name: YrSold, dtype: int64
78.SaleType:
Type of sale销售类型
WD Warranty Deed - Conventional契约-契约
CWD Warranty Deed - Cash担保契约-现金
VWD Warranty Deed - VA Loan担保契约- VA贷款
New Home just constructed and sold房子刚建好就卖了
COD Court Officer Deed/Estate法院官员行为/房地产
Con Contract 15% Down payment regular terms合同首付款15%,定期条款
ConLw Contract Low Down payment and low interest低首付,低利息
ConLI Contract Low Interest合同低利率
ConLD Contract Low Down合同低
Oth Other
column = 'SaleType'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
9
WD 1267
New 122
COD 43
ConLD 9
ConLI 5
ConLw 5
CWD 4
Oth 3
Con 2
Name: SaleType, dtype: int64
79. SaleCondition:
Condition of sale销售条件
Normal Normal Sale正常的销售
Abnorml Abnormal Sale - trade, foreclosure, short sale非正常销售交易,丧失抵押品赎回权,卖空
AdjLand Adjoining Land Purchase毗邻的土地购买
Alloca Allocation - two linked properties with separate deeds, typically condo with a garage unit 房产分配——两个相连的房产,有各自的契约,通常是带车库的公寓
Family Sale between family members家庭成员间买卖
Partial Home was not completed when last assessed (associated with New Homes)房屋在最后一次评估时未完成(与新房屋相关)
column = 'SaleCondition'
print(len(data_train[column].unique()))
print(data_train[column].value_counts())
lisan_plot(column, data_train)
6
Normal 1198
Partial 125
Abnorml 101
Family 20
Alloca 12
AdjLand 4
Name: SaleCondition, dtype: int64
80. SalePrice
column = 'SalePrice'
print(len(data_train[column].unique()))
print('最小值和最大值:',data_train[column].min(), data_train[column].max())
lianxu_plot(column, data_train)
663
最小值和最大值: 34900 755000