Manhattan-world Stereo摘要和简介翻译

Manhattan-world Stereo

曼哈顿世界立体视觉

Abstract

摘要

       Multi-view stereo (MVS) algorithms now produce reconstructions that rival laser range scanner accuracy. However, stereo algorithms require textured surfaces, and therefore work poorly for many architectural scenes (e.g., building interiors with textureless , painted walls). This paper presents a novel MVS approach to overcome these limitations for Manhattan World scenes, i.e., scenes that consists of piece-wise planar surfaces with dominant directions. Given a set of calibrated photographs, we first reconstruct textured regions using an existing MVS algorithm, then extract dominant plane directions, generate plane hypotheses, and recover per-view depth maps using Markov random fields. We have tested our algorithm on several datasets ranging from office interiors to outdoor buildings, and demonstrate results that outperform the current state of the art for such texture-poor scenes.

        多视角立体几何算法(MVS)目前已经能够在重建的精度上,与激光测录扫描仪相媲美了。然而,立体技术需要纹理表面,因此在许多结构化的场景上表现不尽如人意(比方说,内部缺少纹理的建筑,绘画的墙面)。这篇论文提出了一个新颖的多视角集合算法,克服了曼哈顿世界场景的局限,也就是说,场景由带有主要方向的分段的平面表面组成。给定一个校准后的图像集合,我们首先使用已有的多视角立体视觉算法重建纹理区域,接下来提取主要的平面方向,生成平面假设,使用马尔科夫随机场恢复每个视角的深度图。我们已经将算法在许多办公室内部到室外场景中的数据集上进行了测试,并且展示的结果显示比现有的许多前沿方法,在缺少纹理的场景中表现更为出色。

1. Introduction

1. 简介

       The 3D reconstruction of architectural scenes is an important research problem, with large scale efforts underway to recover models of cities at a global scale (e.g., Google Earth, Virtual Earth). Architectural scenes often exhibit strong structural regularities, including flat, texture-poor walls, sharp corners, and axis-aligned geometry, as shown in Figure 1. The presence of such structures suggests opportunities for constraining and therefore simplifying the reconstruction task. Paradoxically, however, these properties are problematic for traditional computer vision methods and greatly complicate the reconstruction problem. The lack of texture leads to ambiguities in machine, whereas the sharp angles and non-fronto-parallel geometry defeat the smoothness assumptions used in dense reconstruction algorithms.

        关于建筑的场景三维重建是一个重要的研究问题,正在使用大量的工作来恢复全球范围的城市的模型(比方说,谷歌地球,虚拟地球等)。建筑物景观经常展现出强大的结构化正规性,包括地面,少纹理的墙面,锋利的角落,以及轴对齐的几何结构,正如图一所示。这种结构的存在提供了限制和简化重建任务的机会。然而,自相矛盾的是,这些性质对于传统计算机是觉得方法是有问题的,在很大程度上让重建问题变得复杂化了。缺少纹理使得在机器上存在模糊性,非前面平行的几何结构打破了平滑性假设,该项假设是用于稠密重建算法的。

       In this paper, we propose a multi-view stereo (MVS) approach specifically designed to exploit properties of architectural scenes. We focus on the problem of recovering depth maps, as opposed to full object models. The key idea is to replace the smoothness prior used in traditional methods with priors that are more appropriate. To this end we invoke the so-called Manhattan-world assumption [10], which states that all surfaces in the world are aligned with three dominant directions, typically corresponding to the X, Y, and Z axes: i.e., the world Is picewise-axis-aligned-planar. We call the resulting approach Manhattan-world stereo. While this assumption may seem to be overly restrictive, note that any scene can be arbitrarily-well approximated (to first order) by axis-aligned geometry, as in the case of a high resolution voxel grid [14, 17]. While the Manhattan-world model may be reminiscent of blocks-world models from the 70’s and 80’s, we demonstrate state-of-the-art results on very complex environments.

       在这篇论文中,我们提出了一个多视角立体视觉方法(MVS),专门用于处理建筑物场景的性质。我们将问题集中于重建深度图,而不是完整的物体模型。关键的想法是替换掉用于传统方法中的平滑性,平滑性对于此前的工作是更加合适的。为了这个目的,我们援引称作曼哈顿世界假设的内容【10】,这个假说生成世界上所有的表面都由三个主要的方向进行对齐,代表性的就是对应于XYZ轴,也就是说,世界是分块轴对齐的平面。尽管这个假说可能看起来过于限制性了,请注意任何场景都可以通过(在第一阶)轴对齐的几何结构任意近似,如在高分辨率体素网格的情形下。尽管曼哈顿世界的模型可能看起来有点让人回想起上世界七八十年代的块状世界模型,我们展示了在最前沿的复杂场景中的优异结果。

       Our approach, within the constrained space of Manhattan-world scenes, offers the following advantages: 1) it Is remarkably robust to lack of texture, and able to model flat painted walls, and 2) it produces remarkably clean, simple models as outputs. Our approach operates as follows. We identify dominant orientations in the scene, as well as a set of candidate planes on which most of the geometry lies. These steps are enabled by first running an existing MVS method to reconstruct the portion of the scene that contains texture, and analyzing the recovered geometry. We then recover a depth map for each pixel in the image. This step is posed as a Markov random field (MRF) and solved with graph cuts [4, 5, 13] (Fig. 2).

       我们的方法,在受约束的曼哈顿世界场景空间中,提供了如下的优点:1)该方法对于缺少纹理的情况有显著的鲁棒性,并且能够给平面绘画的墙体进行建模。2)该方法将非常简洁的模型作为输出。我们的方法用如下的方式进行工作。我们确定场景中的主要方向,以及一个多数几何结构所在的平面候选集。首先运行一个已经存在的多视角立体视觉的方法,来重建场景中包含纹理的比率,以及分析恢复的几何结构,从而实现这些步骤。我们接下来恢复一个每个图像中像素的深度图。这个步骤作为马尔科夫随机场提出的,并且用图像分割的方法来做的【4, 5, 13】(见图2)。

 Manhattan-world Stereo摘要和简介翻译

Figure 1. Increasingly ubiquitous on the Internet are images of architectural scenes with texture-poor but highly structured surfaces.

图1. 网络上越来越普遍存在的建筑物场景的图像,拥有很少的纹理,但是有很强的结构化的表面。

 

Oriented points reconstructed by MVS. 通过多视角立体视觉重建的有向点。

Dominant axes extracted from points. 从点中获取的主要轴。

Point density on d1. peaks. d1方向的点密集度,峰值。

Plane hypotheses generated from peaks. 从峰值生成的平面假设。

Reconstruction by labeling hypotheses to pixels. (MRF) 通过标注像素的假设来进行重建(马尔科夫随机场)。

 

 

 Manhattan-world Stereo摘要和简介翻译

 

 

Figure 2. Our reconstruction pipeline. From a set of input images, an MVS algorithm reconstructs oriented points. We estimate dominant axes d1, d2, d3. Hypothesis planes are found by finding point density peaks along each axis di. These planes are then used as per-pixels labels in an MRF.

图2. 进行重建的流程图。从一个输入图像的集合中,使用一个多视角立体视觉算法重建有向点。我们估计主要的轴d1,d2,和d3.假设平面接下来通过在每个轴找点密度峰值得到。这些平面接下来用作在马尔科夫随机场中每个点的标签。

上一篇:如何跨session传递数据-export database


下一篇:9 进程管理