Done:
- 使用 整理的新数据重新训练 3second-15fps+CNN 网络,测试时发现,目前的模型 对attack检测较好,但可能会把真实人脸识别为 attack.
- 尝试新方法 Multi-Scale Frame + CNN ,使用一帧图片的多个尺度做为输入,可以有效的学习 人脸周围的纹理信息,目前效果来看,对已知的attack场景还行。
关于目前论文实现情况的整理
1.texture_analysis
Inspired by the aforementioned observations, we propose,
in this work, a new face anti-spoofing method based
on color texture analysis. The color Local Binary Patterns
(LBP) descriptor proposed in [7] is used to extract the joint
color-texture information from the face images. In this descriptor,
the uniform LBP histograms are extracted from the
individual image bands. Subsequently, these histograms are
concatenated to form the final descriptor. To gain insight
into which color space is more discriminative to distinguish
real face from fake ones, we considered three color spaces,
namely RGB, HSV and Y CbCr. Extensive experiments
on two challenging benchmark databases, namely CASIA
face anti-spoofing and Replay-Attack databases, clearly indicate
that color texture based method outperforms gray-scale
counterparts in detecting various types of spoofing attacks.
3.2. Results
Table 1 and Table 2 present the results of different LBP based
color texture descriptions and their gray-scale counterparts.
From these results, we can clearly see that the color texture
features significantly improve the performance compared to
the gray-scale LBP-based countermeasure. When comparing
the different color spaces, Y CbCr based representation yields
to the best overall performance. The color LBP features extracted
from the Y CbCr space improves the performance on
CASIA-FA and Replay-Attack databases by 64.5 % and 81.4
%, respectively, compared to the gray-scale LBP features.
From Table 1, we can also observe that the features extracted
from the HSV color space seem to be more effective
against video attacks than those extracted from the Y CbCr
color space. Thus, we studied the benefits of combining the
two color texture representations by fusing them at feature
level. The color LBP descriptions from the two color spaces
were concatenated, thus the size of the resulting histogram
is 59 × 3 × 2. The results in Table 1 and Table 2 indicate
that a significant performance enhancement is obtained, thus
confirming the benefits of combining the different facial color
texture representations.
2.Motion Magnification
光流向量的计算方式
3.Multi-Scale CNN
Scale [1.4, 1.8, 2.2, 2.6]
As shown in Fig. 3, we prepare the input images with
five scales. Images corresponding to the first scale merely
contain face region. With the increase of scale, images
contain more background regions. As for CASIA-FASD
dataset, we can easily find that fake images in large-scale
contain boundaries of photographs compared with genuine
images, which should be exploited as discriminative cues for
anti-spoofing. In another case as REPLAY-ATTACK dataset,
though fake images have no boundary cues, they contains
blurred edges and probable abnormal specular reflections
caused by re-capturing compared with genuine samples in
whole images [35].
3) temporal augmentation: Besides spatial augmentations (增加时序上的图片特征)
above, we also propose to augment the data temporally.
Multiple frames are expected to improve the anti-spoofing
performance due to more informative data. This has been
proved by [11] to some extent, in which a spatial-temporal
texture feature was extracted from consecutive frames. When
fed more than one frame, the CNN can not only learn the
spatial features, but also temporal features for anti-spoofing.
In this paper, we train CNN model using both single frame
and multiple frames, and figure out whether multiple frames
are helpful for CNN to learn more discriminative features.
1) Test on CASIA dataset: We test our method on CASIA
dataset in five different spatial scales from one frame to three
frames. In Table I, the HTERs on development set and test
set are shown. In the table, the average performance over
scales and frames are presented al well. As we can see,
with the increase of spatial scale, the anti-spoofing model
perform better than that of original scale, and achieves the
best when the scale is equal to 3 averagely.These results
indicate the positive effect of background region on face
anti-spoofing task. Actually, similar claim has been proved
in [35]. However, the difference is that images corresponding
to the best scale in this paper are larger than that in [35],
which shows the CNN can extract more useful information
from the background region compared with the hand-crafted
features. However, when the scale reaches 5, the performance
degrades slightly. One possible reason is that the diversity
of background region weakens the positive effect. As for
the number of frames used, the model trained using one
frame outperform gently the models trained with more than
one frames in average. However, when reviewing the results
closely, we find the best performance is obtained by using
two frames with scale 2. This specific result indicates multiframe
input is positive in certain cases.
For details, we show the corresponding ROC curves in
Fig. 5. From the results, we can find input data with scale
2; 3; 4 improve the anti-spoofing consistently over different
frames. These results further show that the background
region is useful for distinguishing genuine and fake face
images. However, the improvement may discount when containing
too much background.
s, f 为 参数
4.optical flow