问题描述
在实际应用中,数据集可能包含缺失值、异常值等复杂情况。如何在处理这些问题的同时进行数据标准化和归一化?
高级代码实例
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.impute import SimpleImputer
# 创建含有缺失值的示例数据集
data = {'Age': [20, 30, 40, 50, 60, None],
'Income': [20000, 30000, 40000, 50000, 80000, 70000]}
df = pd.DataFrame(data)
# 处理缺失值
imputer = SimpleImputer(strategy='mean')
df_imputed = pd.DataFrame(imputer.fit_transform(df), columns=df.columns)
# 数据标准化
scaler = StandardScaler()
df_standardized = pd.DataFrame(scaler.fit_transform(df_imputed), columns=df.columns)
print("处理缺失值后标准化的数据:\n", df_standardized)
# 数据归一化
scaler = MinMaxScaler()
df_normalized = pd.DataFrame(scaler.fit_transform(df_imputed), columns=df.columns)
print("处理缺失值后归一化的数据:\n", df_normalized)