围绕四个点构造模型
1、扩大感受野
使用5*5卷积替换3*3来扩大感受野,在深度分离卷积中,pw与dw计算比为d/k^2,d为输出通道,k为
dw的卷积核,即增加dw的卷积核所带来的计算并不大。
在MobilenetV2中,使用了逆残差结构(两头小,中间大),但blazeblock中又回到了两头大、中间小的结构,理由是
“To accommodate for the fewer number of channels in the intermediate tensors, we swap these stages so that the residual
connections in our bottlenecks operate in the “expanded” (increased) channel resolution” 好吧,说啥就是啥吧
2、特征提取
3、anchor schme
只用两层feature map去生成anchor,且生成anchor比例是1:1固定的,理由是特征冗余,且在GPU下,特征图计算的开销基本固定(意思是cpu下不会?)
4、Post-processing
这部分没看懂
(1)As our feature extractor is not reducing the resolution below 8×8, the number of anchors overlapping a given object significantly increases with the object size.
why???有什么因果关系吗?
(2) 当使用现有NMS时,会出现the predictions tend to fluctuate between different anchors and exhibit temporal jitter
这又有什么因果关系呢?
(3) 所以使用新方法代替nms:estimates the regression parameters of a bounding box as a weighted mean between the overlapping predictions
没看懂