4.SceneCode: Monocular Dense Semantic Reconstruction using Learned Encoded Scene Representations.

目录

 

Abstract :


Abstract :

Systems which incrementally create 3D semantic maps from image sequences must store and update representa- tions of both geometry and semantic entities. However, while there has been much work on the correct formulation for geometrical estimation, state-of-the-art systems usually rely on simple semantic representations which store and up- date independent label estimates for each surface element (depth pixels, surfels, or voxels). Spatial correlation is dis- carded, and fused label maps are incoherent and noisy. We introduce a new compact and optimisable semantic representation by training a variational auto-encoder that is conditioned on a colour image. Using this learned latent space, we can tackle semantic label fusion by jointly opti- mising the low-dimenional codes associated with each of a set of overlapping images, producing consistent fused label maps which preserve spatial correlation. We also show how this approach can be used within a monocular keyframe based semantic mapping system where a similar code ap- proach is used for geometry. The probabilistic formulation allows a flexible formulation where we can jointly estimate motion, geometry and semantics in a unified optimisation.

文摘:系统逐步从图像序列必须创建3 d语义地图存储和更新意味着,几何和语义实体。然而,尽管在几何估计的正确公式方面有很多工作要做,但最先进的系统通常依赖于简单的语义表示,这些语义表示为每个表面元素(深度像素、surfels或体素)存储和最新的独立标签估计。空间相关是离散的,融合的标签映射是不连贯的和有噪声的。通过训练一种基于彩色图像的变分自动编码器,提出了一种新的紧凑、优化的语义表示方法。利用这一学习的潜在空间,我们可以通过联合操作与每一组重叠图像相关联的低维码来处理语义标签融合,生成一致的融合标签映射,从而保持空间相关性。我们还展示了如何在基于单目关键帧的语义映射系统中使用这种方法,其中几何图形使用了类似的代码ap- proach。概率公式允许一个灵活的公式,我们可以联合估计运动,几何和语义在一个统一的优化。

上一篇:DeepLiDARFlow: A Deep Learning Architecture For Scene Flow Estimation Using Monocular Camera and Spa


下一篇:Unsupervised Monocular Depth Estimation with Left-Right Consistency代码详解(model.py)