Manhattan-world Stereo摘要和简介翻译

Manhattan-world Stereo




       Multi-view stereo (MVS) algorithms now produce reconstructions that rival laser range scanner accuracy. However, stereo algorithms require textured surfaces, and therefore work poorly for many architectural scenes (e.g., building interiors with textureless , painted walls). This paper presents a novel MVS approach to overcome these limitations for Manhattan World scenes, i.e., scenes that consists of piece-wise planar surfaces with dominant directions. Given a set of calibrated photographs, we first reconstruct textured regions using an existing MVS algorithm, then extract dominant plane directions, generate plane hypotheses, and recover per-view depth maps using Markov random fields. We have tested our algorithm on several datasets ranging from office interiors to outdoor buildings, and demonstrate results that outperform the current state of the art for such texture-poor scenes.


1. Introduction

1. 简介

       The 3D reconstruction of architectural scenes is an important research problem, with large scale efforts underway to recover models of cities at a global scale (e.g., Google Earth, Virtual Earth). Architectural scenes often exhibit strong structural regularities, including flat, texture-poor walls, sharp corners, and axis-aligned geometry, as shown in Figure 1. The presence of such structures suggests opportunities for constraining and therefore simplifying the reconstruction task. Paradoxically, however, these properties are problematic for traditional computer vision methods and greatly complicate the reconstruction problem. The lack of texture leads to ambiguities in machine, whereas the sharp angles and non-fronto-parallel geometry defeat the smoothness assumptions used in dense reconstruction algorithms.


       In this paper, we propose a multi-view stereo (MVS) approach specifically designed to exploit properties of architectural scenes. We focus on the problem of recovering depth maps, as opposed to full object models. The key idea is to replace the smoothness prior used in traditional methods with priors that are more appropriate. To this end we invoke the so-called Manhattan-world assumption [10], which states that all surfaces in the world are aligned with three dominant directions, typically corresponding to the X, Y, and Z axes: i.e., the world Is picewise-axis-aligned-planar. We call the resulting approach Manhattan-world stereo. While this assumption may seem to be overly restrictive, note that any scene can be arbitrarily-well approximated (to first order) by axis-aligned geometry, as in the case of a high resolution voxel grid [14, 17]. While the Manhattan-world model may be reminiscent of blocks-world models from the 70’s and 80’s, we demonstrate state-of-the-art results on very complex environments.


       Our approach, within the constrained space of Manhattan-world scenes, offers the following advantages: 1) it Is remarkably robust to lack of texture, and able to model flat painted walls, and 2) it produces remarkably clean, simple models as outputs. Our approach operates as follows. We identify dominant orientations in the scene, as well as a set of candidate planes on which most of the geometry lies. These steps are enabled by first running an existing MVS method to reconstruct the portion of the scene that contains texture, and analyzing the recovered geometry. We then recover a depth map for each pixel in the image. This step is posed as a Markov random field (MRF) and solved with graph cuts [4, 5, 13] (Fig. 2).

       我们的方法,在受约束的曼哈顿世界场景空间中,提供了如下的优点:1)该方法对于缺少纹理的情况有显著的鲁棒性,并且能够给平面绘画的墙体进行建模。2)该方法将非常简洁的模型作为输出。我们的方法用如下的方式进行工作。我们确定场景中的主要方向,以及一个多数几何结构所在的平面候选集。首先运行一个已经存在的多视角立体视觉的方法,来重建场景中包含纹理的比率,以及分析恢复的几何结构,从而实现这些步骤。我们接下来恢复一个每个图像中像素的深度图。这个步骤作为马尔科夫随机场提出的,并且用图像分割的方法来做的【4, 5, 13】(见图2)。

 Manhattan-world Stereo摘要和简介翻译

Figure 1. Increasingly ubiquitous on the Internet are images of architectural scenes with texture-poor but highly structured surfaces.

图1. 网络上越来越普遍存在的建筑物场景的图像,拥有很少的纹理,但是有很强的结构化的表面。


Oriented points reconstructed by MVS. 通过多视角立体视觉重建的有向点。

Dominant axes extracted from points. 从点中获取的主要轴。

Point density on d1. peaks. d1方向的点密集度,峰值。

Plane hypotheses generated from peaks. 从峰值生成的平面假设。

Reconstruction by labeling hypotheses to pixels. (MRF) 通过标注像素的假设来进行重建(马尔科夫随机场)。



 Manhattan-world Stereo摘要和简介翻译



Figure 2. Our reconstruction pipeline. From a set of input images, an MVS algorithm reconstructs oriented points. We estimate dominant axes d1, d2, d3. Hypothesis planes are found by finding point density peaks along each axis di. These planes are then used as per-pixels labels in an MRF.

图2. 进行重建的流程图。从一个输入图像的集合中,使用一个多视角立体视觉算法重建有向点。我们估计主要的轴d1,d2,和d3.假设平面接下来通过在每个轴找点密度峰值得到。这些平面接下来用作在马尔科夫随机场中每个点的标签。

上一篇:如何跨session传递数据-export database

下一篇:9 进程管理