数据集:
HMDB51 包含51类动作,每个动作至少包含51个视频,分辨率320*240。
1) 一般面部动作微笑,大笑,咀嚼,交谈。
2) 面部操作与对象操作:吸烟,吃,喝。
http://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/#Downloads
Kinetics
https://deepmind.com/research/open-source/open-source-datasets/kinetics/
深度学习方法:
1)3D卷积:计算量大
[12]. D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri, "Learning spatiotemporal features with 3D convolutional networks", ICCV, pp. 4489-4497, Dec. 2015.
[13]. Hara, K., Kataoka, H., & Satoh, Y. Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?. CVPR, 2018 , pp. 6546-6555.
[14]. Z. Qiu, T. Yao, T. Mei, "Learning spatio-temporal representation with pseudo-3D residual networks", ICCV, pp. 5534-5542, Oct. 2017.
[15]. Carreira, Joao, and Andrew Zisserman. "Quo vadis, action recognition? a new model and the kinetics dataset.", CVPR 2017.
2)RNN/LSTM
[16]. Donahue, Jeffrey, et al. "Long-term recurrent convolutional networks for visual recognition and description.", CVPR. 2015.
3)双流法:
[9]. K. Simonyan, A. Zisserman, "Two-stream convolutional networks for action recognition in videos", NIPS, pp. 568-576, 2014.
[10]. L. Wang et al., "Temporal segment networks: Towards good practices for deep action recognition", ECCV, pp. 20-36, 2016.
[11]. Xu, Baohan, et al. "Dense Dilated Network for Video Action Recognition." IEEE Transactions on Image Processing (2019).