激光点云语义分割-俯视图系列

激光点云语义分割-俯视图系列

文章目录

1. 自己的研究思路

通过利用pointpillar的俯视图编码; 然后用Unet进行分割(Unet进行细胞分割时,能分割很精细的线条,而俯视图中点云也是精细的线条);最后通过反查找的方式进行点云label的映射,映射到3D; 另外用knn或者CRF进行修正;

2. SalsaNet: Fast Road and Vehicle Segmentation in LiDAR Point Clouds for Autonomous Driving

作者 年份 学校 会议 数据集 性能
Eren Erdal Aksoy 20190918 - CoRR2019 KITTI SalsaNet 93.75 (background/iou), 73.72(road/iou), 71.44(Vehicle/iou) 79.74(avg)

1. 基本思想

利用图像的道路标注的掩码,映射到点云地面点,然后生成训练集; 然后将点云前视图和俯视图映射,用类似Unet进行分割; 效果如下:
激光点云语义分割-俯视图系列

2. 基本原理

2.1 数据准备过程

  • 如下图所示,整个数据处理过程
  • 1)先用MultiNet进行Kitti的地面分割, 为什么用MultiNet,因为作者说只有这个模型是在Kitti道路图像上训练过模型参数, 然后将图像上分割的映射到点云上。
  • 2)然后再用MaskRCNN进行车辆的语义分割,然后映射到3D点云上
  • 如图中间的效果, 考虑俯视图丢失了高度信息, 然后考虑了两种映射方式,将点云在前视图和俯视图上分别映射,然后用网络去预测学习;

俯视图

  • 1)俯视图范围w=[-6, 12], L=[0,50] --> 映射为256x64大小的2D; cell_size(0.2, 0.3)
    1. 每个cell的编码:Similar to the work in 论文[4], in each grid cell, we compute the mean and maximum elevation, average reflectivity (i.e. intensity) value, and number of projected points.
    1. Compared to 论文[4], we avoid using the minimum and standard deviation values of the height as additional features since our experiments showed that there is no significant contribution coming from those channels.

前视图

  • 1)和SqueezeSeg一样的投射方式
  • 2)前视图一个不好的特性:遮挡,弯曲和变形:Although SFV returns more dense representation compared to BEV, SFV has certain distortion and deformation effects on small objects, e.g. vehicles. It is also more likely that objects in SFV tend to occlude each other. We, therefore, employ BEV representation as the main input to our network.
    激光点云语义分割-俯视图系列

2.2 模型

  • 在conv_block中的最后一层添加dropout和pooling; dropout对与3D这种噪声大的数据可能有好处;
  • 下采样16x;
  • dropout放置的位置作者给出了说明,参考的是文献[26],We here emphasize that dropout needs to be placed right after batch normalization. As shown in [26], an early application of dropout can otherwise lead to a shift in the weight distribution and thus minimize the effect of batch normalization during training.
    激光点云语义分割-俯视图系列
    2.3 类别不平衡问题
  • 用类别比例的开方作为loss weight的比例
    激光点云语义分割-俯视图系列

2.4 训练超参数配置

  • 数据增强的特殊方式: adding random pixel noise with probaility of 0.5, random rotation [-5,5]

3. 实验效果

  • 性能对比(外部)
    激光点云语义分割-俯视图系列

  • 俯视图和前视图的性能对比 (内部)
    激光点云语义分割-俯视图系列
    激光点云语义分割-俯视图系列

  • 速度对比
    激光点云语义分割-俯视图系列
    激光点云语义分割-俯视图系列

4. 一些启示

  • 下面引用[11]需要学习一下, 如何半自动标注
  • 这个论文的揭露了我们不一定把所有地面标出来,可以只标注freespace
  • 这个论文loss weight的设计有一定参考价值
  • 性能对比,不仅仅IOU, precision和recall的性能对比

3. Online Inference and Detection of Curbs in Partially Occluded Scenes with Sparse LIDAR

作者 年份 学校 会议 数据集 性能
Tarlan Suleymanov 20190711 - ITSC19 OxfordRobotcar dataset 81-91(F1 score)

1. 基本思想

  • 提出了一个框架标注3D点云,并且将其投射到鸟瞰图,对路沿进行掩码标注,标注包括:遮挡和未遮挡的。
  • 主要自己构建路沿检测数据集,然后学习预测遮挡的和不遮挡路沿的数据(可能用传统算法获取路沿拟合的线),提出了一个网络分别拟合遮挡的和不遮挡的路沿,并且用了anchor line机制,该anchor机制提高了预测的准确度(和物体检测的anchor思想一样)

2. 基本原理

相关工作

  • [11] uses range and intensity information from 3D LIDAR to detect
    visible curbs on elevation data, which fails in the presence of occluding obstacles.
  • [12] presents a LIDAR-based method to detect visible curbs using sliding-beam segmentation followed by segment-specific curb detection, but fails to detect curbs behind obstacles.

如何生成路沿的曲线

  • In this work, we used images acquired by a Point Grey Bumblebee XB3 camera, mounted on the front of the platform facing towards
    the direction of motion. In particular, our implementation of VO uses FAST corners [16] combined with BRIEF descriptors [17], RANSAC [18] for outlier rejection, and nonlinear least-squares refinement.

  • 将点的高度设置在3.55m内, 防止地上的水导致点云点特别低的情况。

将可视的线和遮挡的线进行分离

To determine which points are visible and which are occluded we use the hidden point removal operator as described in [20]. The operator determines all visible points in a pointcloud when observed from a given viewpoint. This is achieved by extracting all points residing on the convex hull of a transformed pointcloud. These points resemble the visible points, all other (labeled) points are considered as hidden (or occluded). We take the previously trimmed pointclouds and create binary bird’s-eye view images by taking the height of points from the ground into account. The points that are within a predefined height difference from the LIDAR roughly correspond to the points (obstacles) that are
blocking the view. By putting together raw labels and binary masks of obstacles, obtained by running the hidden point removal algorithm, we obtain separate masks for visible and occluded road boundaries。(待了解)

网络结构

  • 分析类Unet模型,不能很好检测处遮挡路沿的原因: first, the network’s limited receptive field, which is not big enough to capture context around large obstacles to estimate the position of curbs behind them, and second, the lack of structure (model-free) which prevents the network to infer very thin curves of occluded road boundaries within an image.
    激光点云语义分割-俯视图系列
  • 可见路沿是Unet这样的网络直接进行检测;
  • 遮挡的路沿(也就是这种路沿是没有实际点特征的,label是表示这个点是遮挡路沿的点), 采用anchor line的方式(设定一些先验线),如下图所示, 取4个角度的先验线, 然后取最切近的一个线去预测目标线。
  • 每个grid cell怎么预测这些线的?: Lines in each grid cell are parameterised in a discrete-continuous(离散且连续的线段) form: first, fitted lines are assigned to one of four types of anchor lines, and secondly, offsets between fitted and anchor lines are calculated. Anchor lines pass through the centre of a grid cell at different angles (22.5◦,67.5◦, 112.5◦and 157.5◦). During fitting, lines are assignedto the closest anchor line. Once a fitted line is discretised,two continuous parameters are calculated: (1) an angle offsetbetween a fitted and the respective anchor line ($w^k_{i,j,gy}
    ) , a n d ( 2 ) a d i s t a n c e f r o m t h e c e n t r e o f t h e c e l l t o t h e f i t t e d l i n e ( ), and (2) a distance from the centre of the cell to the fitted line ( ),and(2)adistancefromthecentreofthecelltothefittedline(β^k_{i, j,gt}$). As a result, we obtain 16 numbers for each grid cell, 4 numbers(w, β \beta β, 类别-是否是线) for each line category.
    激光点云语义分割-俯视图系列
  • To increase the receptive field of the model we added
    intra-layer convolutions [23] before the multi-scale parameter
    estimation layers. Traditional layer-by-layer convolutions are
    applied between feature maps, but intra-layer convolutions
    are slice-by-slice convolutions within feature maps. Hence,
    intra-layer convolutions capture aspects across the whole
    image and can thereby capture spatial relationships over
    longer distances. For example, there is a strong correlation
    between the length of the occluded curbs and the size of
    objects which are obstructing the view (ranging from 10-15
    pixels through occlusions by traffic cones to 200-300 pixels
    through occlusions by several parked cars).
  • 用交叉熵损失预测是否为路沿, 用smoothL1预测w, β \beta β;

后处理

  • 采用时间信息,也就是前后帧进行跟踪识别, 这样做有两个好处: filtering out false positives and tracking true positives.
  • VO: 用旋转和平移矩阵表示前后帧路沿线的关系; 通过视觉里程计VO,将前一帧的结果映射到后一帧,然后与后一帧的识别结果进行综合;
  • filtering: we transform the last three output masks of detected road boundaries into a common reference attached to the current frame. Then we construct a histogram of output mask size (480x960) by counting the number of overlapping pixels with a value grater than threshold of 0.7 (which was determined experimentally). 可能如果这三帧的在同一个位置都有值的话,histogram会高,则保留; 否则则剔除该点。
  • Tracking. In the second step, we perform a similar procedure as outlined above. However, this time we consider
    road boundary masks from the last three frames that were
    generated by the first step (as shown in Figure 9). By
    taking the union of these masks we track the detected road
    boundaries over the time. Integrating temporal information
    helps to close gaps between boundary segments
    激光点云语义分割-俯视图系列

3. 实验效果

  • 总结性能图
    激光点云语义分割-俯视图系列
  • 可见路沿和遮挡路沿的对比
  • 添加后处理后的效果
    激光点云语义分割-俯视图系列

4. 一些启示

  • 直接预测路沿线的方法
  • 用anchor line对遮挡的线进行预测
  • 线的拟合思路: Fast corner【16】–> Brief descriptor[17]–>Ransac[18] for outlier rejection and nonlinear least-squares refinement.
  • 将路沿分成可见路沿和遮挡路沿两种类别的思路不错
  • 可见路沿和遮挡路沿分开进行预测的方式也值得借鉴。

4. 阿里比赛

4.1 阿里竞赛第四名分享

4.1.1 比赛要求

要求参赛者的方案在i7 CPU+GTX 1080 GPU显卡的硬件上达到至少10帧每秒的处理速度; 所以本比赛一致采用的是点云映射到平面上的方式,采用俯视图检测出框,然后再圈点

4.1.2 基本技术点-技术路线

  • 采用complexYOLO的数据映射方式- 栅格地图
  • ROI区域–>Z:-2m—2M
  • 单个栅格尺寸:10cm*10cm
  • 栅格类别判断-> 非机动车->行人->机动车

4.1.3 数据增强

  • 随机30‘旋转
  • 随即水平旋转
  • 随即平移
  • RGB值变更
  • ROI尺寸变更
  • 最大最小高度地图

鸟瞰图像有一个很大的特点,就是多方向性。传统图像数据集里面,道路目标姿态往往都是类似的,同时也不会有较大的倾斜。鸟瞰图数据集的这个问题就严重,道路目标的朝向东南西北都有可能的,因此训练集里的朝向应当要丰富,避免学习到的模型不具有泛性。 针对这一问题我们采取了下面几种数据增强方法:随机30°倍数旋转,随机水平翻转,随机平移。(均为线上数据增强) 这脏方式有优点,但是可能会存在一些异常数据被造出来,如翻转的话,车的左右会被调换这样可能不合理

官方提供了pts,intensity,category三类点云数据,我们这里参考了Complex-YOLO: Real-time 3D Object Detection on Point Clouds的思路将pts,intensity点云数据处理为最大反射强度,最大高度,归一化密度后再分别归一化到0~1的范围后重组为三通道图片数组,作为我们的训练图像。

4.1.4 网络模型

YOLOv3: 0.09
RetinaNet 0.18
复赛

CascadeRCNN 严重超时
Faster RCNN Resnet101 +ROI Align+ FPN:0.2

4.1.5 tricks

模型融合
softnms
最后一层feature map attention
focal loss
复赛

根据推断阶段的图片尺寸重做大小一直的训练集
调整nms
调整不同类别的置信度阈值-0.24

4.1.6 题外话

这道题目前三名均为国内外在职工程师,第四名、第五名是在校研究生,第一名团队的答辩人是一位白俄罗斯的Kaggle Master,成绩领先我们在内的中国团队很多,且实时性暴打我们一众人。

博主说:其实前四名的方案全是使用的栅格地图,方法都是雷同的,且均达到了10fps的硬性实时要求。总结一下,PointNet及其衍生方法是不适合这个赛题的,个人觉得pointnet更适合室内这种小空间、高信息量、实时性要求不高的环境。
主要原因如下:

a、Ponitnet系列方案会利用每一个点云进行分析,而无人驾驶点云数据较为稀疏;

b、无人驾驶实时性要求高,而采集到的数据范围极广,前后纵深可达四十米,使用全部数据,很有可能超时,等检测到前方有障碍物,估计已经车毁人亡了;

4.1.7 一些经验总结

1.我们本次比赛对数据的清洗和分析做的不够,实际上该数据集类间数量分布很不均匀,需要针对这个情况,对每个类别进行置信度调整,同时部分数据的标注也存在一定的问题,要进行部分数据的筛选。 2.验证集的分割没做好,理想应该挑选5%的数据作为验证集,我们的验证集太小,缺乏代表性 3.图像分割在这个赛题会比目标检测算法有着更好的精度,同时速度上也会有较大的优势。 4.若使用图像分割的话,推断时间减少,就可以尝试在inference阶段使用TTA(test time augmentation) 的思路,减少假阳性.

给我们的启示是对数据需要细细研究一下,看一下类别分布方面

上一篇:用少量箭射爆气球,leetcode习题


下一篇:蒙特卡洛模拟