[l论文笔记]CVPR2019_Learning Context Graph for Person Search

2024-02-03 12:01:34

这篇论文是CVPR2019的oral，将graph learning引入了person search task中，文章很不错，特此学习。

1. Introduction

　　这篇文章对图像中的context information（上下文信息）做了进一步的挖掘，利用其辅助person search的决策。其基本思想是寻找query和gallery图像对中都存在的人（不只是target person），比如说Figure 3中在婚礼场景中，我们要判断的是新郎（红框）是不是同一个人（这句话怪怪的~），但是这两张照片中都出现的新娘和花童（绿框）显然可以帮助我们更好地决策，这就是文章所说的context information。很多情况下，特别是多摄像头同时捕捉，或者临近时刻内拍摄的画面，两景画面中存在多对same dentities。那么，这些行人在两幅图像中都存在，能够帮助我们更好的决策。特别是当identity本身的特征判别性不够高时，其周边的其它行人可以很好地辅助。那么同时，缺点就是如果时间跨度比较大，target person处在不同环境中，特别是在室外监控视频中，这种context information的作用就小很多。

　　一个probe-gallery的图像对：对于一个给定的target person（query图像中标定的target person？），①首先通过contextual instance expansion模块来寻找context information，具体即是搜集场景中所有的行人作为context candidates；②用relative attention模块进行筛选，这一步考虑probe和gallery图像中所有的context candidates，输出matched pairs，作为informative context；③构建context graph，利用graph learning框架来计算query和gallery中target pair的相似性，其中graph node包含target pair和context pairs，所有的context nodes都与target node相连。个人更直白的理解是①行人的检测；②对query和gallery中检测的行人进行粗匹配，得到匹配对（包括query标定的target person以及gallery匹配的target person？）；③由target pair和context pairs构建图，综合判断target pair的相似性。

　　整个框架搭建在第一篇person search via deep learning的框架基础上，如Figure 1所示。

3. Methodology

3.1 Overview

核心思想是扩展instance feature的表达能力，不再局限在目标行人本身上提取特征，也将周围行人作为特征学习进去。

Instance Detection and Feature Learning

主要是对faster R-CNN框架加以改进，进行联合的行人检测和特征学习。这一部分将person re-ID中的part-based特征学习框架纳入进来，以提升特征学习的判别能力。

Contextual Instance Expansion

将query和gallery图像中所有的instance pairs作为context candidates，利用relative attention layer来衡量context pair之间的视觉相似性，并筛选出足够高confidence的instance pairs作为informative contexts。

Contextual Graph Representation Learning

对于一对probe-gallery图像对，构建图来计算target pair的相似性。图节点包含着target persons以及相关的(associated) context pairs，它们之间用graph edges链接。用graph convolutional network来学习probe-gallery图像对的相似性。

3.2. Instance Detection and Feature Learning

3.2.1 Pedestrian Detection

　　这一部分的基本框架如Figure 2.ResNet-50作为基本网络，分成两个部分，图像先经过conv1-conv4_3做特征提取，输出1024 channel+1/16输入大小尺寸的feature map后，经过PPN（实际就是RPN）和NMS得到Region Proposals。和RPN一样，PPN也是用两个loss训练：一个binary softmax进行person/non-person二分类，一个linear layer做bbox regression。所有的proposals经过ROI Pooling后送入ResNet-50的第二部分conv4_4-conv5_3，随后经过Global Average Pooling得到2048维特征，再一分为二连接两个FC：一个是binary softmax layer进行person/non-person二分类，一个是256维的FC layer，其输出会被进一步L2归一化，以用作inference的特征表达。那ROI Pooling之后的部分怎么训练，loss是什么，都没有说明

码农公寓

[l论文笔记]CVPR2019_Learning Context Graph for Person Search

1. Introduction

3. Methodology

3.1 Overview

Instance Detection and Feature Learning

Contextual Instance Expansion

Contextual Graph Representation Learning

3.2. Instance Detection and Feature Learning

3.2.1 Pedestrian Detection

3.2.2 Region-based Feature Learning

码农公寓

1. Introduction

3. Methodology

3.1 Overview

Instance Detection and Feature Learning

Contextual Instance Expansion

Contextual Graph Representation Learning

3.2. Instance Detection and Feature Learning

3.2.1 Pedestrian Detection

3.2.2 Region-based Feature Learning

相关文章