Towards real-time unsupervised monocular depth estimation on CPU
Matteo Poggi , Filippo Aleotti , Fabio Tosi , Stefano Mattoccia
在CPU上进行实时无监督单目深度估计
Abstract— Unsupervised depth estimation from a single image is a very attractive technique with several implications in robotic, autonomous navigation, augmented reality and so on.This topic represents a very challenging task and the advent of deep learning enabled to tackle this problem with excellent results. However, these architectures are extremely deep and complex. Thus, real-time performance can be achieved only by leveraging power-hungry GPUs that do not allow to infer depth maps in application fields characterized by low-power constraints. To tackle this issue, in this paper we propose a novel architecture capable to quickly infer an accurate depth map on a CPU, even of an embedded system, using a pyramid of features extracted from a single input image. Similarly to state-of-the-art, we train our network in an unsupervised manner casting depth estimation as an image reconstruction problem.Extensive experimental results on the KITTI dataset show that compared to the top performing approach our network has similar accuracy but a much lower complexity (about 6% of parameters) enabling to infer a depth map for a KITTI image in about 1.7 s on the Raspberry Pi 3 and at more than 8 Hz on a standard CPU. Moreover, by trading accuracy for efficiency, our network allows to infer maps at about 2 Hz and 40 Hz respectively, still being more accurate than most state-of-the-art slower methods. To the best of our knowledge, it is the first method enabling such performance on CPUs paving the way for effective deployment of unsupervised monocular depth estimation even on embedded systems.
单个图像的无监督深度估计是一种非常有吸引力的技术,在机器人,自主导航,增强现实等方面具有多种意义。本主题代表了一项非常具有挑战性的任务,深度学习的出现使得能够以优异的成绩解决这一问题。但是,这些架构非常深刻和复杂。 因此,仅通过利用耗电量大的GPU可以实现实时性能,所述GPU不允许在以低功率约束为特征的应用领域中推断深度图。为了解决这个问题,在本文中,我们提出了一种新颖的架构,能够使用从单个输入图像中提取的特征金字塔,在CPU甚至是嵌入式系统上快速推断出精确的深度图。与现有技术类似,我们以无人监督的方式训练我们的网络,将深度估计作为图像重建问题。此外,通过交易效率的准确性,我们的网络允许分别推断大约2 Hz和40 Hz的地图,仍然比大多数最先进的慢速方法更准确。据我们所知,这是第一种在CPU上实现这种性能的方法,即使在嵌入式系统上也能为有效部署无监督单眼深度估计铺平道路。