基于支持向量机和降维PCA的人脸识别实战

2024-10-19 18:36:28

公众号：尤而小屋
编辑：Peter
作者：Peter

大家好，我是Peter~

今天给大家介绍一个基于支持向量机SVM和PCA降维的人脸识别的实战案例，主要包含：

人脸数据lfw数据集下载
PCA降维
基于SVM的分类模型构建
模型分类预测结果可视化

效果如下图：

基于SVM和PCA算法的人脸识别

使用数据为fetch_lfw_people人脸数据集。数据集中每个人至少有一张图片，每张图片都对应不同的人。这个数据集的目标是训练一个分类器来识别不同的人。

官网地址：https://scikit-learn.org/1.5/modules/generated/sklearn.datasets.fetch_lfw_people.html

sklearn.datasets.fetch_lfw_people(
    data_home=None, 
    funneled=True, 
    resize=0.5, 
    min_faces_per_person=0, 
    color=False, 
    slice_=(slice(70, 195, None), slice(78, 172, None)), 
    download_if_missing=True, 
    return_X_y=False)

导入库

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import fetch_lfw_people
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.decomposition import PCA
from sklearn.svm import SVC

导入数据

lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)
type(lfw_people)

sklearn.utils._bunch.Bunch

数据基本信息：

n_samples, h, w = lfw_people.images.shape

n_samples, h, w # 数据量、高度、宽度

(1288, 50, 37)

分离特征X和目标y

1、获取特征

数据特征X和特征数量：

X = lfw_people.data
n_features = X.shape[1]  # 特征数量
X[:2]


array([[0.99607843, 0.9973857 , 0.9908497 , ..., 0.37908497, 0.38823533,0.38169935],
[0.1503268 , 0.19607843, 0.1764706 , ..., 0.45882353, 0.44313726,0.53594774]], dtype=float32)

n_features

1850

len(X)  # 样本量

1288

2、分离目标变量y

y = lfw_people.target
target_names = lfw_people.target_names

array([5, 6, 3, …, 5, 3, 5], dtype=int64)

target_names


array(['Ariel Sharon', 'Colin Powell', 'Donald Rumsfeld', 'George W Bush',
'Gerhard Schroeder', 'Hugo Chavez', 'Tony Blair'], dtype='<U17')

# 总类别数：
n_classes = target_names.shape[0]
n_classes

print("整体数据基本信息:")
print("样本量: %d" % n_samples)
print("特征数: %d" % n_features)
print("分类数: %d" % n_classes)

整体数据基本信息:
样本量: 1288
特征数: 1850
分类数: 7

数据切分train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

降维PCA

由于数据的特征过多，使用PCA算法进行降维：

n_components = 150

pca = PCA(n_components=n_components, # 选择150个主成分；从1850--->150
          svd_solver='randomized',
          whiten=True).fit(X_train)

生成降维后的数据：

new_lfw_people = pca.components_.reshape((n_components,h,w))
new_lfw_people.shape

(150, 50, 37)

X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)

SVM模型

训练

# 模型参数
param_grid = {'C': [1e3, 5e3, 1e4, 5e4, 1e5],
             'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1], }

# 网格搜索
clf = GridSearchCV(
    SVC(kernel='rbf', class_weight='balanced'), 
    param_grid
)

# 模型的训练
clf = clf.fit(X_train_pca,y_train)

模型的最佳参数组合：

print(clf.best_estimator_)

SVC(C=1000.0, class_weight=‘balanced’, gamma=0.005)

预测

y_pred = clf.predict(X_test_pca)
y_pred[:10]

array([3, 3, 6, 3, 3, 3, 4, 1, 3, 3], dtype=int64)

模型评估

对分类模型的效果进行评估：

# 分类报告
print(classification_report(y_test,y_pred,target_names=target_names))

                   precision    recall  f1-score   support

     Ariel Sharon       0.88      0.54      0.67        13
     Colin Powell       0.75      0.88      0.81        60
  Donald Rumsfeld       0.85      0.63      0.72        27
    George W Bush       0.86      0.97      0.91       146
Gerhard Schroeder       0.95      0.80      0.87        25
      Hugo Chavez       1.00      0.47      0.64        15
       Tony Blair       0.97      0.81      0.88        36

         accuracy                           0.85       322
        macro avg       0.89      0.73      0.79       322
     weighted avg       0.86      0.85      0.85       322

# 混淆矩阵
print(confusion_matrix(y_test, y_pred, labels=range(n_classes)))

结果表示为：


[[  7   2   0   4   0   0   0]
[  1  53   2   4   0   0   0]
[  0   4  17   6   0   0   0]
[  0   4   0 142   0   0   0]
[  0   1   0   3  20   0   1]
[  0   5   0   2   1   7   0]
[  0   2   1   4   0   0  29]]

可视化

def plot_gallery(images, titles, h, w, n_row=3, n_col=4):
    """
    images: 图像数据
    titles：标题列表
    h：高度
    w：宽度
    n_row=3，n_col=4：图形的行列数
    """
    plt.figure(figsize=(1.8 * n_col, 2.4 * n_row))  # 图像大小
    plt.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)  # 调整子图之间的距离
    
    for i in range(n_row * n_col):  # 遍历所有的子图
        plt.subplot(n_row, n_col, i + 1)
        plt.imshow(images[i].reshape((h, w)), cmap=plt.cm.gray)
        plt.title(titles[i], size=12)
        plt.xticks(())
        plt.yticks(())

def title(y_pred, y_test, target_names, i):
    """
    y_pred：预测值
    y_test：真实值
    target_names：名称列表
    i：索引值
    """
    pred_name = target_names[y_pred[i]].rsplit(' ', 1)[-1]
    true_name = target_names[y_test[i]].rsplit(' ', 1)[-1]
    return f'predicted: {pred_name} \ntrue: {true_name}'

target_names

array(['Ariel Sharon', 'Colin Powell', 'Donald Rumsfeld', 'George W Bush',
'Gerhard Schroeder', 'Hugo Chavez', 'Tony Blair'], dtype='<U17')

y_pred[:10]

array([3, 3, 6, 3, 3, 3, 4, 1, 3, 3], dtype=int64)

预测的标题列表：

prediction_titles = [title(y_pred, y_test, target_names, i) for i in range(y_pred.shape[0])]
prediction_titles[:5]

[‘predicted: Bush \ntrue: Bush’,
‘predicted: Bush \ntrue: Bush’,
‘predicted: Blair \ntrue: Blair’,
‘predicted: Bush \ntrue: Bush’,
‘predicted: Bush \ntrue: Bush’]

可视化效果：

plot_gallery(X_test, prediction_titles, h, w)

eigenface_titles = ["eigenface %d" % i for i in range(new_lfw_people.shape[0])]

eigenface_titles[:10]

[‘eigenface 0’,
‘eigenface 1’,
‘eigenface 2’,
‘eigenface 3’,
‘eigenface 4’,
‘eigenface 5’,
‘eigenface 6’,
‘eigenface 7’,
‘eigenface 8’,
‘eigenface 9’]

plot_gallery(new_lfw_people, eigenface_titles, h, w)

码农公寓