PointPillars整体网络结构有三个部分构成,他们分别是PFN(Pillar Feature Net),Backbone(2D CNN)和Detection Head(SSD)。其中PFN是PointPillars中最重要也是最具创新性的部分。
在正式讨论PFN之前我们可以幻想一下自己变成一个小蜜蜂,飞翔在3D点云空间中,俯瞰整个场景中的点云。它们在各个区域的分布或相对密集,或稀疏,或者几乎就不可见。在PFN中会将我们俯瞰到的X-Y平面划分成固定尺寸的网格,每一个网格单元在Z轴方向延伸就形成了一个个的柱子,也就是所谓的Pillar。因为点云在空间中分布的不均匀,不同Pillar中的点云数量也就不一样。而且由于点云分布的高度稀疏性,有很大一部分Pillar里面是没有点云的。下图可以比较直观地表
示这一点。基于以上论序,对于PointPillars而言,我们应该有以下几点认识:(1).同一帧点云数据中不同Pillar的点云数量是非固定的;(2).不同帧点云数据中非空Pillar的数量也是动态变化的。PointPillars对于非空Pillar会设定一个采样阈值(默认是100)。当Pillar中的点云数量超过阈值则会采样到该阈值,而点云数量小于该阈值的Pillar则使用0进行填充。说了这么多,PointPillars中的点云特征提取具体是怎么做的呢?原始的点云数据有4个维度 (x,y,z,r)
, r
代表反射率,论文中将其扩展为9个维度 (x,y,z,r,x_c,y_c,z_c,x_p, y_p),
带 c
下标的是Pillar中的点相对于Pillar内左右点算术中心的偏移量,带 p
下标的是点相对于Pillar网格中心(x,y center)的偏移量。于是就形成了维度为为(D,P,N)
的张量,其中D=9, N为每个 Pillar
的采样点数, P
为非空的 Pillar
数目。
然后就是学习点云特征,具体来说就是用一个简化的 PointNet
从 D
维中学出 C
个channel来得到一个(C,P,N)
的张量。然后在N
这个维度上做max operation,得到 (C,P)
的张量.最后得到 (C,H,W)
的伪图像。
class PFNLayer(nn.Module):
def __init__(self,
in_channels,
out_channels,
use_norm=True,
last_layer=False):
"""
Pillar Feature Net Layer.
The Pillar Feature Net could be composed of a series of these layers, but the PointPillars paper results only
used a single PFNLayer. This layer performs a similar role as second.pytorch.voxelnet.VFELayer.
:param in_channels: <int>. Number of input channels.
:param out_channels: <int>. Number of output channels.
:param use_norm: <bool>. Whether to include BatchNorm.
:param last_layer: <bool>. If last_layer, there is no concatenation of features.
"""
"""
对于PointPillars而言,它只用到了一层PFNLayer
"""
super().__init__()
self.name = 'PFNLayer'
self.last_vfe = last_layer
if not self.last_vfe:
out_channels = out_channels // 2
self.units = out_channels
self.in_channels = in_channels
# if use_norm:
# BatchNorm1d = change_default_args(eps=1e-3, momentum=0.01)(nn.BatchNorm1d)
# Linear = change_default_args(bias=False)(nn.Linear)
# else:
# BatchNorm1d = Empty
# Linear = change_default_args(bias=True)(nn.Linear)
self.linear= nn.Linear(self.in_channels, self.units, bias = False)
self.norm = nn.BatchNorm2d(self.units, eps=1e-3, momentum=0.01)
# kernel=>1x1x9x64
self.conv1 = nn.Conv2d(in_channels=self.in_channels, out_channels=self.units, kernel_size=1, stride=1)
self.conv2 = nn.Conv2d(in_channels=100, out_channels=1, kernel_size=1, stride=1)
self.t_conv = nn.ConvTranspose2d(100, 1, (1,8), stride=(1,7))
"""
这个操作nb,这里不是用的max pooling,而是用的带孔卷积,
34 + 33*2 = 100
用了一个w方向kernel size为34,w方向dilation为3的带孔卷及
"""
self.conv3 = nn.Conv2d(64, 64, kernel_size=(1, 34), stride=(1, 1), dilation=(1,3))
def forward(self, input):
"""
(Pdb) masked_features.shape
torch.Size([1, 9, 6815, 100])
"""
"""
(Pdb) input.shape
torch.Size([1, 9, 6815, 100])
"""
x = self.conv1(input) #kernel=>1x1x9x64?
x = self.norm(x) #=>shape[1,64,6815,100]
x = F.relu(x) #
x = self.conv3(x) #=>shape[1,64,6815,1]
return x
class PillarFeatureNet(nn.Module):
def __init__(self,
num_input_features=4,
use_norm=True,
num_filters=(64,),
with_distance=False,
voxel_size=(0.2, 0.2, 4),
pc_range=(0, -40, -3, 70.4, 40, 1)):
"""
Pillar Feature Net.
The network prepares the pillar features and performs forward pass through PFNLayers. This net performs a
similar role to SECOND's second.pytorch.voxelnet.VoxelFeatureExtractor.
:param num_input_features: <int>. Number of input features, either x, y, z or x, y, z, r.
:param use_norm: <bool>. Whether to include BatchNorm.
:param num_filters: (<int>: N). Number of features in each of the N PFNLayers.
:param with_distance: <bool>. Whether to include Euclidean distance to points.
:param voxel_size: (<float>: 3). Size of voxels, only utilize x and y size.
:param pc_range: (<float>: 6). Point cloud range, only utilize x and y min.
"""
super().__init__()
self.name = 'PillarFeatureNet'
assert len(num_filters) > 0
num_input_features += 5
if with_distance:
num_input_features += 1
self._with_distance = with_distance
# Create PillarFeatureNet layers
num_filters = [num_input_features] + list(num_filters) #[9,64]
pfn_layers = []
for i in range(len(num_filters) - 1):
in_filters = num_filters[i] #输入通道,9
out_filters = num_filters[i + 1] #输出通道,64
if i < len(num_filters) - 2:
last_layer = False
else:
last_layer = True
pfn_layers.append(PFNLayer(in_filters, out_filters, use_norm, last_layer=last_layer))
self.pfn_layers = nn.ModuleList(pfn_layers)
# Need pillar (voxel) size and x/y offset in order to calculate pillar offset
self.vx = voxel_size[0]
self.vy = voxel_size[1]
self.x_offset = self.vx / 2 + pc_range[0]
self.y_offset = self.vy / 2 + pc_range[1]
def forward(self, pillar_x, pillar_y, pillar_z, pillar_i,
num_voxels, #num_points,名字为啥总是在变呢??
x_sub_shaped, y_sub_shaped,
mask):
# Find distance of x, y, and z from cluster center
# pillar_xyz = torch.cat((pillar_x, pillar_y, pillar_z), 3)
pillar_xyz = torch.cat((pillar_x, pillar_y, pillar_z), 1)
# points_mean = pillar_xyz.sum(dim=2, keepdim=True) / num_voxels.view(1,-1, 1, 1)
#算pillar算术均值
points_mean = pillar_xyz.sum(dim=3, keepdim=True) / num_voxels.view(1, 1, -1, 1)
f_cluster = pillar_xyz - points_mean
# Find distance of x, y, and z from pillar center
f_center_offset_0 = pillar_x - x_sub_shaped
f_center_offset_1 = pillar_y - y_sub_shaped
"""
(Pdb) f_center_concat.shape
torch.Size([1, 2, 6815, 100])
"""
f_center_concat = torch.cat((f_center_offset_0, f_center_offset_1), 1)
pillar_xyzi = torch.cat((pillar_x, pillar_y, pillar_z, pillar_i), 1)
features_list = [pillar_xyzi, f_cluster, f_center_concat]
features = torch.cat(features_list, dim=1)
masked_features = features * mask
pillar_feature = self.pfn_layers[0](masked_features)
#return=>shape[1,64,6815,1]
return pillar_feature