论文笔记之：Optical Flow Estimation using a Spatial Pyramid Network

2022-11-19 15:50:52

　　Optical Flow Estimation using a Spatial Pyramid Network

spynet

　　本文将经典的 spatial-pyramid formulation 和 deep learning 的方法相结合，以一种 coarse to fine approach，进行光流的计算。This estiamates large motions in a coarse to fine approach by warping one image of a pair at each pyramid level by the current flow estimate and compute an update to the flow.

　　我们利用 CNN 来进行每一层 flow 的更新，而不是传统方法中目标函数的最小化。与 FlowNet 相比，本文的方法不需要处理 large motions；这些已经在 pyramid 中处理了。该方法的主要优势有：

　　1. our Spatial Pyramid Network is much simpler and 96% smaller than FlowNet in terms of model parameters.

　　2. since the flow at each pyramid level is small (< 1 pixel), a convolutional approach applied to pairs of warped images is appropriate.

　　3. unlike FlowNet, the learned convolution filters appear similar to classical spatio-temporal filters, giving insight into the method and how to improve it.

　　现有方法存在的 主要问题：

　　将两张图直接 stack大一起，放到 CNN 当中。当两帧图像之间的 motion 大于 one or a few pixels， spatial-temporal convolutional filters 将不会收到有效的相应。也就是说，if a convolutional window in one image does not overlap with related image pixels at the next time instant, no meaningful temporal filter can be learned.

　　这里需要解决两个关键性的问题：1. 长期依赖的问题；　　2. detailed, sub-pixel, optical flow and precise motion boundaries。FlowNet 是尝试在一个网络中解决这两个问题，而该方法则是用 CNN 来解决第二个问题，用现有的方法来解决第一个问题。

　　Approach：

　　本文用 spatial pyramid 的方式，from coarse to fine 的方法来解决 large motion的问题。

　　其流程图如下所示：

　　在训练上一层网络 G 的时候，需要下面几层的初始 flow 结果。而本文得到训练所需的 gt，是根据 gt flow 和下一层光流图上采样后的结果之间的差值的得到的。根据这个，来训练当前的网络参数。

码农公寓

相关文章