项目地址
https://aistudio.baidu.com/aistudio/projectdetail/3389378?contributionType=1
项目可fork一键运行。
赛题介绍
在众多大规模视频分析情景中,从冗长未经修剪的视频中定位并识别短时间内发生的人体动作成为一个备受关注的课题。当前针对人体动作检测的解决方案在大规模视频集上难以奏效,高效地处理大规模视频数据仍然是计算机视觉领域一个充满挑战的任务。其核心问题可以分为两部分,一是动作识别算法的复杂度仍旧较高,二是缺少能够产生更少视频提案数量的方法(更加关注短时动作本身的提案)。
这里所指的视频动作提案是指一些包含特定动作的候选视频片段。为了能够适应大规模视频分析任务,时序动作提案应该尽可能满足下面两个需求:
(1)更高的处理效率,例如可以设计出使时序视频片段编码和打分更高效的机制;
(2)更强的判别性能,例如可以准确定位动作发生的时间区间。
本次比赛旨在激发更多的开发者和研究人员关注并参与有关视频动作定位的研究,创建性能更出色的动作定位模型。
数据集介绍
本次比赛的数据集包含了19-21赛季兵乓球国际比赛(世界杯、世锦赛、亚锦赛,奥运会)和国内比赛(全运会,乒超联赛)中标准单机位高清转播画面的特征信息,共包含912条视频特征文件,每个视频时长在0~6分钟不等,特征维度为2048,以pkl格式保存。我们对特征数据中面朝镜头的运动员的回合内挥拍动作进行了标注,单个动作时常在0~2秒不等,训练数据为729条标注视频,A测数据为91条视频,B测数据为92条视频,训练数据标签以json格式给出。
数据集预处理
本方案采用PaddleVideo中的BMN模型。BMN模型是百度自研,2019年ActivityNet夺冠方案,为视频动作定位问题中proposal的生成提供高效的解决方案,在PaddlePaddle上首次开源。此模型引入边界匹配(Boundary-Matching, BM)机制来评估proposal的置信度,按照proposal开始边界的位置及其长度将所有可能存在的proposal组合成一个二维的BM置信度图,图中每个点的数值代表其所对应的proposal的置信度分数。网络由三个模块组成,基础模块作为主干网络处理输入的特征序列,TEM模块预测每一个时序位置属于动作开始、动作结束的概率,PEM模块生成BM置信度图。
本赛题中的数据包含912条ppTSM抽取的视频特征,特征保存为pkl格式,文件名对应视频名称,读取pkl之后以(num_of_frames, 2048)向量形式代表单个视频特征。其中num_of_frames是不固定的,同时数量也比较大,所以pkl的文件并不能直接用于训练。同时由于乒乓球每个动作时间非常短,为了可以让模型更好的识别动作,所以这里将数据进行分割。
- 首先解压数据集
执行以下命令解压数据集,解压之后将压缩包删除,保证项目空间小于100G。否则项目会被终止。
%cd /home/aistudio/data/
!tar xf data122998/Features_competition_train.tar.gz
!tar xf data123004/Features_competition_test_A.tar.gz
!cp data122998/label_cls14_train.json .
!rm -rf data12*
/home/aistudio/data
- 解压好数据之后,首先对label标注文件进行分割。执行以下脚本分割标注文件。
import json
import random
import numpy as np
random.seed(0)
source_path = "/home/aistudio/data/label_cls14_train.json"
annos = json.load(open(source_path))
fps = annos['fps']
annos = annos['gts']
new_annos = {}
max_frams = 0
for anno in annos:
if anno['total_frames'] > max_frams:
max_frams = anno['total_frames']
for i in range(9000//100):
subset = 'training'
clip_start = i * 4
clip_end = (i + 1) * 4
video_name = anno['url'].split('.')[0] + f"_{i}"
new_annos[video_name] = {
'duration_second': 100 / fps,
'subset': subset,
'duration_frame': 100,
'annotations': [],
'feature_frame': -1
}
actions = anno['actions']
for act in actions:
start_id = act['start_id']
end_id = act['end_id']
new_start_id = -1
new_end_id = -1
if start_id > clip_start and end_id < clip_end:
new_start_id = start_id - clip_start
new_end_id = end_id - clip_start
elif start_id < clip_start < end_id < clip_end:
new_start_id = 0
new_end_id = end_id - clip_start
elif clip_start < start_id < clip_end < end_id:
new_start_id = start_id - clip_start
new_end_id = 4
elif start_id < clip_start < clip_end < end_id:
new_start_id = 0
new_end_id = 4
else:
continue
new_annos[video_name]['annotations'].append({
'segment': [round(new_start_id, 2), round(new_end_id, 2)],
'label': str(act['label_ids'][0])
})
if len(new_annos[video_name]['annotations']) == 0:
new_annos.pop(video_name)
json.dump(new_annos, open('new_label_cls14_train.json', 'w+'))
print(len(list(new_annos.keys())))
12597
执行完毕后,在data目录中生成了新的标注文件new_label_cls14_train.json。下面开始分割训练集和测试集的数据。
- 执行以下脚本,分割训练集。
import os
import os.path as osp
import glob
import pickle
import paddle
import numpy as np
file_list = glob.glob("/home/aistudio/data/Features_competition_train/*.pkl")
max_frames = 9000
npy_path = ("/home/aistudio/data/Features_competition_train/npy/")
if not osp.exists(npy_path):
os.makedirs(npy_path)
for f in file_list:
video_feat = pickle.load(open(f, 'rb'))
tensor = paddle.to_tensor(video_feat['image_feature'])
pad_num = 9000 - tensor.shape[0]
pad1d = paddle.nn.Pad1D([0, pad_num])
tensor = paddle.transpose(tensor, [1, 0])
tensor = paddle.unsqueeze(tensor, axis=0)
tensor = pad1d(tensor)
tensor = paddle.squeeze(tensor, axis=0)
tensor = paddle.transpose(tensor, [1, 0])
sps = paddle.split(tensor, num_or_sections=90, axis=0)
for i, s in enumerate(sps):
file_name = osp.join(npy_path, f.split('/')[-1].split('.')[0] + f"_{i}.npy")
np.save(file_name, s.detach().numpy())
pass
W0107 21:28:29.299958 141 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0107 21:28:29.305644 141 device_context.cc:465] device: 0, cuDNN Version: 7.6.
!rm /home/aistudio/data/Features_competition_train/*.pkl
执行后在data/Features_competition_train/npy目录下生成了训练用的numpy数据。
import glob
import pickle
import json
import numpy as np
import paddle
file_list = glob.glob("/home/aistudio/data/Features_competition_test_A/*.pkl")
max_frames = 9000
npy_path = ("/home/aistudio/data/Features_competition_test_A/npy/")
if not osp.exists(npy_path):
os.makedirs(npy_path)
for f in file_list:
video_feat = pickle.load(open(f, 'rb'))
tensor = paddle.to_tensor(video_feat['image_feature'])
pad_num = 9000 - tensor.shape[0]
pad1d = paddle.nn.Pad1D([0, pad_num])
tensor = paddle.transpose(tensor, [1, 0])
tensor = paddle.unsqueeze(tensor, axis=0)
tensor = pad1d(tensor)
tensor = paddle.squeeze(tensor, axis=0)
tensor = paddle.transpose(tensor, [1, 0])
sps = paddle.split(tensor, num_or_sections=90, axis=0)
for i, s in enumerate(sps):
file_name = osp.join(npy_path, f.split('/')[-1].split('.')[0] + f"_{i}.npy")
np.save(file_name, s.detach().numpy())
pass
训练模型
数据集分割好之后,可以开始训练模型,使用以下命令进行模型训练。首先需要安装PaddleVideo的依赖包。
%cd /home/aistudio/PaddleVideo/
!pip install -r requirements.txt
开始训练模型。
%cd /home/aistudio/PaddleVideo/
!python main.py -c configs/localization/bmn.yaml
/home/aistudio/PaddleVideo
[01/07 21:42:50] DALI is not installed, you can improve performance if use DALI
[01/07 21:42:50] [35mDATASET[0m :
[01/07 21:42:50] [35mbatch_size[0m : [92m16[0m
[01/07 21:42:50] [35mnum_workers[0m : [92m8[0m
[01/07 21:42:50] [35mtest[0m :
[01/07 21:42:50] [35mfile_path[0m : [92m/home/aistudio/data/new_label_cls14_train.json[0m
[01/07 21:42:50] [35mformat[0m : [92mBMNDataset[0m
[01/07 21:42:50] [35msubset[0m : [92mvalidation[0m
[01/07 21:42:50] [35mtest_mode[0m : [92mTrue[0m
[01/07 21:42:50] [35mtest_batch_size[0m : [92m1[0m
[01/07 21:42:50] [35mtrain[0m :
[01/07 21:42:50] [35mfile_path[0m : [92m/home/aistudio/data/new_label_cls14_train.json[0m
[01/07 21:42:50] [35mformat[0m : [92mBMNDataset[0m
[01/07 21:42:50] [35msubset[0m : [92mtrain[0m
[01/07 21:42:50] [35mvalid[0m :
[01/07 21:42:50] [35mfile_path[0m : [92m/home/aistudio/data/new_label_cls14_train.json[0m
[01/07 21:42:50] [35mformat[0m : [92mBMNDataset[0m
[01/07 21:42:50] [35msubset[0m : [92mvalidation[0m
[01/07 21:42:50] ------------------------------------------------------------
[01/07 21:42:50] [35mINFERENCE[0m :
[01/07 21:42:50] [35mdscale[0m : [92m100[0m
[01/07 21:42:50] [35mfeat_dim[0m : [92m2048[0m
[01/07 21:42:50] [35mname[0m : [92mBMN_Inference_helper[0m
[01/07 21:42:50] [35mresult_path[0m : [92mdata/bmn/BMN_INFERENCE_results[0m
[01/07 21:42:50] [35mtscale[0m : [92m100[0m
[01/07 21:42:50] ------------------------------------------------------------
[01/07 21:42:50] [35mMETRIC[0m :
[01/07 21:42:50] [35mdscale[0m : [92m100[0m
[01/07 21:42:50] [35mfile_path[0m : [92mdata/bmn_data/activitynet_1.3_annotations.json[0m
[01/07 21:42:50] [35mground_truth_filename[0m : [92mdata/bmn_data/activity_net_1_3_new.json[0m
[01/07 21:42:50] [35mname[0m : [92mBMNMetric[0m
[01/07 21:42:50] [35moutput_path[0m : [92mdata/bmn/BMN_Test_output[0m
[01/07 21:42:50] [35mresult_path[0m : [92mdata/bmn/BMN_Test_results[0m
[01/07 21:42:50] [35msubset[0m : [92mvalidation[0m
[01/07 21:42:50] [35mtscale[0m : [92m100[0m
[01/07 21:42:50] ------------------------------------------------------------
[01/07 21:42:50] [35mMODEL[0m :
[01/07 21:42:50] [35mbackbone[0m :
[01/07 21:42:50] [35mdscale[0m : [92m100[0m
[01/07 21:42:50] [35mfeat_dim[0m : [92m2048[0m
[01/07 21:42:50] [35mname[0m : [92mBMN[0m
[01/07 21:42:50] [35mnum_sample[0m : [92m32[0m
[01/07 21:42:50] [35mnum_sample_perbin[0m : [92m3[0m
[01/07 21:42:50] [35mprop_boundary_ratio[0m : [92m0.5[0m
[01/07 21:42:50] [35mtscale[0m : [92m100[0m
[01/07 21:42:50] [35mframework[0m : [92mBMNLocalizer[0m
[01/07 21:42:50] [35mloss[0m :
[01/07 21:42:50] [35mdscale[0m : [92m100[0m
[01/07 21:42:50] [35mname[0m : [92mBMNLoss[0m
[01/07 21:42:50] [35mtscale[0m : [92m100[0m
[01/07 21:42:50] ------------------------------------------------------------
[01/07 21:42:50] [35mOPTIMIZER[0m :
[01/07 21:42:50] [35mlearning_rate[0m :
[01/07 21:42:50] [35mboundaries[0m : [92m[39000][0m
[01/07 21:42:50] [35miter_step[0m : [92mTrue[0m
[01/07 21:42:50] [35mname[0m : [92mCustomPiecewiseDecay[0m
[01/07 21:42:50] [35mvalues[0m : [92m[0.001, 0.0001][0m
[01/07 21:42:50] [35mname[0m : [92mAdam[0m
[01/07 21:42:50] [35mweight_decay[0m :
[01/07 21:42:50] [35mname[0m : [92mL2[0m
[01/07 21:42:50] [35mvalue[0m : [92m0.0001[0m
[01/07 21:42:50] ------------------------------------------------------------
[01/07 21:42:50] [35mPIPELINE[0m :
[01/07 21:42:50] [35mtest[0m :
[01/07 21:42:50] [35mload_feat[0m :
[01/07 21:42:50] [35mfeat_path[0m : [92m/home/aistudio/data/Features_competition_train/npy[0m
[01/07 21:42:50] [35mname[0m : [92mLoadFeat[0m
[01/07 21:42:50] [35mtransform[0m :
[01/07 21:42:50] [35mGetMatchMap[0m :
[01/07 21:42:50] [35mtscale[0m : [92m100[0m
[01/07 21:42:50] [35mGetVideoLabel[0m :
[01/07 21:42:50] [35mdscale[0m : [92m100[0m
[01/07 21:42:50] [35mtscale[0m : [92m100[0m
[01/07 21:42:50] [35mtrain[0m :
[01/07 21:42:50] [35mload_feat[0m :
[01/07 21:42:50] [35mfeat_path[0m : [92m/home/aistudio/data/Features_competition_train/npy[0m
[01/07 21:42:50] [35mname[0m : [92mLoadFeat[0m
[01/07 21:42:50] [35mtransform[0m :
[01/07 21:42:50] [35mGetMatchMap[0m :
[01/07 21:42:50] [35mtscale[0m : [92m100[0m
[01/07 21:42:50] [35mGetVideoLabel[0m :
[01/07 21:42:50] [35mdscale[0m : [92m100[0m
[01/07 21:42:50] [35mtscale[0m : [92m100[0m
[01/07 21:42:50] [35mvalid[0m :
[01/07 21:42:50] [35mload_feat[0m :
[01/07 21:42:50] [35mfeat_path[0m : [92m/home/aistudio/data/Features_competition_train/npy[0m
[01/07 21:42:50] [35mname[0m : [92mLoadFeat[0m
[01/07 21:42:50] [35mtransform[0m :
[01/07 21:42:50] [35mGetMatchMap[0m :
[01/07 21:42:50] [35mtscale[0m : [92m100[0m
[01/07 21:42:50] [35mGetVideoLabel[0m :
[01/07 21:42:50] [35mdscale[0m : [92m100[0m
[01/07 21:42:50] [35mtscale[0m : [92m100[0m
[01/07 21:42:50] ------------------------------------------------------------
[01/07 21:42:50] [35mepochs[0m : [92m100[0m
[01/07 21:42:50] [35mlog_level[0m : [92mINFO[0m
[01/07 21:42:50] [35mmodel_name[0m : [92mBMN[0m
[01/07 21:42:50] [35mresume_from[0m : [92m[0m
W0107 21:42:50.985046 3073 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0107 21:42:50.990319 3073 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[01/07 21:42:57] train subset video numbers: 12597
[01/07 21:43:03] [35mepoch:[ 1/100][0m [95mtrain step:0 [0m [92mloss: 2.58411 lr: 0.001000[0m [92mbatch_cost: 5.10899 sec,[0m [92mreader_cost: 2.53376 sec,[0m ips: 3.13173 instance/sec.
[01/07 21:43:27] epoch:[ 1/100] [95mtrain step:10 [0m [92mloss: 2.23687 lr: 0.001000[0m [92mbatch_cost: 2.50506 sec,[0m [92mreader_cost: 0.00167 sec,[0m ips: 6.38707 instance/sec.
[01/07 21:43:53] epoch:[ 1/100] [95mtrain step:20 [0m [92mloss: 2.30660 lr: 0.001000[0m [92mbatch_cost: 2.50880 sec,[0m [92mreader_cost: 0.00028 sec,[0m ips: 6.37755 instance/sec.
[01/07 21:44:18] epoch:[ 1/100] [95mtrain step:30 [0m [92mloss: 2.01538 lr: 0.001000[0m [92mbatch_cost: 2.52706 sec,[0m [92mreader_cost: 0.00153 sec,[0m ips: 6.33146 instance/sec.
[01/07 21:44:43] epoch:[ 1/100] [95mtrain step:40 [0m [92mloss: 2.03807 lr: 0.001000[0m [92mbatch_cost: 2.52628 sec,[0m [92mreader_cost: 0.00032 sec,[0m ips: 6.33342 instance/sec.
[01/07 21:45:08] epoch:[ 1/100] [95mtrain step:50 [0m [92mloss: 1.40200 lr: 0.001000[0m [92mbatch_cost: 2.54893 sec,[0m [92mreader_cost: 0.00157 sec,[0m ips: 6.27714 instance/sec.
^C
[01/07 21:45:13] main proc 3139 exit, kill process group 3073
[01/07 21:45:13] main proc 3138 exit, kill process group 3073
[01/07 21:45:13] main proc 3140 exit, kill process group 3073
[01/07 21:45:13] main proc 3141 exit, kill process group 3073
[01/07 21:45:13] main proc 3135 exit, kill process group 3073
[01/07 21:45:13] main proc 3142 exit, kill process group 3073
[01/07 21:45:13] main proc 3137 exit, kill process group 3073
[01/07 21:45:13] main proc 3136 exit, kill process group 3073
这里为了演示训练一个epoch后,停止训练导出模型。实际情况可训练多个epoch,提升模型精度。
模型导出
将训练好的模型导出用于推理预测,执行以下脚本。
%cd /home/aistudio/PaddleVideo/
!python tools/export_model.py -c configs/localization/bmn.yaml -p output/BMN/BMN_epoch_00001.pdparams -o inference/BMN
/home/aistudio/PaddleVideo
Building model(BMN)...
W0107 23:10:26.288929 9431 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0107 23:10:26.295006 9431 device_context.cc:465] device: 0, cuDNN Version: 7.6.
Loading params from (output/BMN/BMN_epoch_00001.pdparams)...
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
return (isinstance(seq, collections.Sequence) and
model (BMN) has been already saved in (inference/BMN).
推理预测
使用导出的模型进行推理预测,执行以下命令。
%cd /home/aistudio/PaddleVideo/
!python tools/predict.py --input_file /home/aistudio/data/Features_competition_test_A/npy \
--config configs/localization/bmn.yaml \
--model_file inference/BMN/BMN.pdmodel \
--params_file inference/BMN/BMN.pdiparams \
--use_gpu=True \
--use_tensorrt=False
上面程序输出的json文件是分割后的预测结果,还需要将这些文件组合到一起。执行以下脚本:
import os
import json
import glob
json_path = "/home/aistudio/data/Features_competition_test_A/npy"
json_files = glob.glob(os.path.join(json_path, '*_*.json'))
submit_dic = {"version": None,
"results": {},
"external_data": {}
}
results = submit_dic['results']
for json_file in json_files:
j = json.load(open(json_file, 'r'))
old_video_name = list(j.keys())[0]
video_name = list(j.keys())[0].split('/')[-1].split('.')[0]
video_name, video_no = video_name.split('_')
start_id = int(video_no) * 4
if len(j[old_video_name]) == 0:
continue
for i, top in enumerate(j[old_video_name]):
if video_name in results.keys():
results[video_name].append({'score': round(top['score'], 2),
'segment': [round(top['segment'][0] + start_id, 2), round(top['segment'][1] + start_id, 2)]})
else:
results[video_name] = [{'score':round(top['score'], 2),
'segment': [round(top['segment'][0] + start_id, 2), round(top['segment'][1] + start_id, 2)]}]
json.dump(submit_dic, open('/home/aistudio/submission.json', 'w', encoding='utf-8'))
最后会在用户目录生成submission.json文件,压缩后下载提交即可。
%cd /home/aistudio/
!zip submission.zip submission.json
/home/aistudio
updating: submission.json (deflated 91%)
只训练了一个epoch,得分38分。本方案得分不高,只是给大家提供了一个可以跑通程序以及数据预处理的思路,大家可以尝试提出更好的数据处理方案,获得更优秀的成绩。
优化思路
- 可以增加训练的epoch数量。
- 可以调整学习率策略,比如warmup和余弦退火等。
- 我认为最关键的还是数据预处理,本方案只是简单的每4秒划分,其实并不合理,会出现将一个动作划到两个文件可能。可参照FootballAciton的划分方法,进一步优化训练数据。
最后祝大家都能获得好成绩。
欢迎大家关注我的公众号:人工智能研习社
获取最新的比赛Baseline,可在后台回复比赛名称或比赛网址,我会尽量为大家提供Baseline。