输出结果
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
None
Pregnancies Glucose BloodPressure SkinThickness BMI Outcome
0 6 148 72 35 33.6 1
1 1 85 66 29 26.6 0
2 8 183 64 0 23.3 1
3 1 89 66 23 28.1 0
4 0 137 40 35 43.1 1
实现代码
# ML之DS:特征工程中的特征拼接处理(常用于横向拼接自变量特征和因变量特征)
import pandas as pd
data_frame=pd.read_csv('data_csv_xls\diabetes\diabetes.csv')
print(data_frame.info())
col_label='Outcome'
cols_other=['Pregnancies','Glucose','BloodPressure','SkinThickness','BMI']
data_X=data_frame[cols_other]
data_y_label_μ=data_frame[col_label]
data_dall = pd.concat([data_X, data_y_label_μ], axis=1)
print(data_dall.head())