公众号:尤而小屋
编辑:Peter
作者:Peter
大家好,我是Peter~
今天给大家介绍一个基于支持向量机SVM和PCA降维的人脸识别的实战案例,主要包含:
- 人脸数据lfw数据集下载
- PCA降维
- 基于SVM的分类模型构建
- 模型分类预测结果可视化
效果如下图:
基于SVM和PCA算法的人脸识别
使用数据为fetch_lfw_people人脸数据集。数据集中每个人至少有一张图片,每张图片都对应不同的人。这个数据集的目标是训练一个分类器来识别不同的人。
官网地址:https://scikit-learn.org/1.5/modules/generated/sklearn.datasets.fetch_lfw_people.html
sklearn.datasets.fetch_lfw_people(
data_home=None,
funneled=True,
resize=0.5,
min_faces_per_person=0,
color=False,
slice_=(slice(70, 195, None), slice(78, 172, None)),
download_if_missing=True,
return_X_y=False)
导入库
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import fetch_lfw_people
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.decomposition import PCA
from sklearn.svm import SVC
导入数据
lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)
type(lfw_people)
sklearn.utils._bunch.Bunch
数据基本信息:
n_samples, h, w = lfw_people.images.shape
n_samples, h, w # 数据量、高度、宽度
(1288, 50, 37)
分离特征X和目标y
1、获取特征
数据特征X和特征数量:
X = lfw_people.data
n_features = X.shape[1] # 特征数量
X[:2]
array([[0.99607843, 0.9973857 , 0.9908497 , ..., 0.37908497, 0.38823533,0.38169935],
[0.1503268 , 0.19607843, 0.1764706 , ..., 0.45882353, 0.44313726,0.53594774]], dtype=float32)
n_features
1850
len(X) # 样本量
1288
2、分离目标变量y
y = lfw_people.target
target_names = lfw_people.target_names
y
array([5, 6, 3, …, 5, 3, 5], dtype=int64)
target_names
array(['Ariel Sharon', 'Colin Powell', 'Donald Rumsfeld', 'George W Bush',
'Gerhard Schroeder', 'Hugo Chavez', 'Tony Blair'], dtype='<U17')
# 总类别数:
n_classes = target_names.shape[0]
n_classes
7
print("整体数据基本信息:")
print("样本量: %d" % n_samples)
print("特征数: %d" % n_features)
print("分类数: %d" % n_classes)
整体数据基本信息:
样本量: 1288
特征数: 1850
分类数: 7
数据切分train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
降维PCA
由于数据的特征过多,使用PCA算法进行降维:
n_components = 150
pca = PCA(n_components=n_components, # 选择150个主成分;从1850--->150
svd_solver='randomized',
whiten=True).fit(X_train)
生成降维后的数据:
new_lfw_people = pca.components_.reshape((n_components,h,w))
new_lfw_people.shape
(150, 50, 37)
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)
SVM模型
训练
# 模型参数
param_grid = {'C': [1e3, 5e3, 1e4, 5e4, 1e5],
'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1], }
# 网格搜索
clf = GridSearchCV(
SVC(kernel='rbf', class_weight='balanced'),
param_grid
)
# 模型的训练
clf = clf.fit(X_train_pca,y_train)
模型的最佳参数组合:
print(clf.best_estimator_)
SVC(C=1000.0, class_weight=‘balanced’, gamma=0.005)
预测
y_pred = clf.predict(X_test_pca)
y_pred[:10]
array([3, 3, 6, 3, 3, 3, 4, 1, 3, 3], dtype=int64)
模型评估
对分类模型的效果进行评估:
# 分类报告
print(classification_report(y_test,y_pred,target_names=target_names))
precision recall f1-score support
Ariel Sharon 0.88 0.54 0.67 13
Colin Powell 0.75 0.88 0.81 60
Donald Rumsfeld 0.85 0.63 0.72 27
George W Bush 0.86 0.97 0.91 146
Gerhard Schroeder 0.95 0.80 0.87 25
Hugo Chavez 1.00 0.47 0.64 15
Tony Blair 0.97 0.81 0.88 36
accuracy 0.85 322
macro avg 0.89 0.73 0.79 322
weighted avg 0.86 0.85 0.85 322
# 混淆矩阵
print(confusion_matrix(y_test, y_pred, labels=range(n_classes)))
结果表示为:
[[ 7 2 0 4 0 0 0]
[ 1 53 2 4 0 0 0]
[ 0 4 17 6 0 0 0]
[ 0 4 0 142 0 0 0]
[ 0 1 0 3 20 0 1]
[ 0 5 0 2 1 7 0]
[ 0 2 1 4 0 0 29]]
可视化
def plot_gallery(images, titles, h, w, n_row=3, n_col=4):
"""
images: 图像数据
titles:标题列表
h:高度
w:宽度
n_row=3,n_col=4:图形的行列数
"""
plt.figure(figsize=(1.8 * n_col, 2.4 * n_row)) # 图像大小
plt.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35) # 调整子图之间的距离
for i in range(n_row * n_col): # 遍历所有的子图
plt.subplot(n_row, n_col, i + 1)
plt.imshow(images[i].reshape((h, w)), cmap=plt.cm.gray)
plt.title(titles[i], size=12)
plt.xticks(())
plt.yticks(())
def title(y_pred, y_test, target_names, i):
"""
y_pred:预测值
y_test:真实值
target_names:名称列表
i:索引值
"""
pred_name = target_names[y_pred[i]].rsplit(' ', 1)[-1]
true_name = target_names[y_test[i]].rsplit(' ', 1)[-1]
return f'predicted: {pred_name} \ntrue: {true_name}'
target_names
array(['Ariel Sharon', 'Colin Powell', 'Donald Rumsfeld', 'George W Bush',
'Gerhard Schroeder', 'Hugo Chavez', 'Tony Blair'], dtype='<U17')
y_pred[:10]
array([3, 3, 6, 3, 3, 3, 4, 1, 3, 3], dtype=int64)
预测的标题列表:
prediction_titles = [title(y_pred, y_test, target_names, i) for i in range(y_pred.shape[0])]
prediction_titles[:5]
[‘predicted: Bush \ntrue: Bush’,
‘predicted: Bush \ntrue: Bush’,
‘predicted: Blair \ntrue: Blair’,
‘predicted: Bush \ntrue: Bush’,
‘predicted: Bush \ntrue: Bush’]
可视化效果:
plot_gallery(X_test, prediction_titles, h, w)
eigenface_titles = ["eigenface %d" % i for i in range(new_lfw_people.shape[0])]
eigenface_titles[:10]
[‘eigenface 0’,
‘eigenface 1’,
‘eigenface 2’,
‘eigenface 3’,
‘eigenface 4’,
‘eigenface 5’,
‘eigenface 6’,
‘eigenface 7’,
‘eigenface 8’,
‘eigenface 9’]
plot_gallery(new_lfw_people, eigenface_titles, h, w)