deeplabv3+ decoder代码 详解

deeplabv3+ decoder代码 详解

论文中的decoder特点

  • 两个输入
  • 分了两次上采样,存在1/16与1/4特征图。

看代码model.py: refine_by_decoder

看代码定义与注释:

def refine_by_decoder(features,
                      end_points,
                      crop_size=None,
                      decoder_output_stride=None,
                      decoder_use_separable_conv=False,
                      model_variant=None,
                      weight_decay=0.0001,
                      reuse=None,
                      is_training=False,
                      fine_tune_batch_norm=False,
                      use_bounded_activation=False):
  """Adds the decoder to obtain sharper segmentation results.

  Args:
    features: A tensor of size [batch, features_height, features_width,
      features_channels].
    end_points: A dictionary from components of the network to the corresponding
      activation.
    crop_size: A tuple [crop_height, crop_width] specifying whole patch crop
      size.
    decoder_output_stride: A list of integers specifying the output stride of
      low-level features used in the decoder module.
    decoder_use_separable_conv: Employ separable convolution for decoder or not.
    model_variant: Model variant for feature extraction.
    weight_decay: The weight decay for model variables.
    reuse: Reuse the model variables or not.
    is_training: Is training or not.
    fine_tune_batch_norm: Fine-tune the batch norm parameters or not.
    use_bounded_activation: Whether or not to use bounded activations. Bounded
      activations better lend themselves to quantized inference.

  Returns:
    Decoder output with size [batch, decoder_height, decoder_width,
      decoder_channels].

  Raises:
    ValueError: If crop_size is None.
  """

我给出输入数据的具体样子。

deeplabv3+ decoder代码 详解

函数部分输入参数解读:

 

decoder_output_stride :意味着我decode后是1/4,也就是(129,129,channels)大小的tensor

end_point :是一个字典,使用方法tensor_features=end_point[feature_name]就可以取出这个feature

features :原始图片是512大小,所以这里的feature输入是1/16上采样。

fine_tune_batch_norm :是否精调BN(batch normalization)参数

 

BN (batch normalization)配置:

  batch_norm_params = {
      'is_training': is_training and fine_tune_batch_norm,
      'decay': 0.9997,
      'epsilon': 1e-5,
      'scale': True,
  }

之后配置了所有的decoder的参数,这意味着,所有的decode操作都要服从下面代码:

slim.arg_scopevariable_scope教程,都很简单很短。

  with slim.arg_scope(
      [slim.conv2d, slim.separable_conv2d],
      weights_regularizer=slim.l2_regularizer(weight_decay),
      activation_fn=tf.nn.relu6 if use_bounded_activation else tf.nn.relu,
      normalizer_fn=slim.batch_norm,
      padding='SAME',
      stride=1,
      reuse=reuse):
    with slim.arg_scope([slim.batch_norm], **batch_norm_params):
      with tf.variable_scope(DECODER_SCOPE, DECODER_SCOPE, [features]):

另外插播一段传递**参数的教程说明:

>>> def test_args_kwargs(arg1, arg2, arg3):
...     print("arg1:", arg1)
...     print("arg2:", arg2)
...     print("arg3:", arg3)
... 
>>> kwargs = {"arg3": 3, "arg2": "two", "arg1": 5}
>>> test_args_kwargs(**kwargs)
arg1: 5
arg2: two
arg3: 3
>>> kwargs = {"arg3": 3, "arg2": "two", "arg": 5}
>>> test_args_kwargs(**kwargs)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: test_args_kwargs() got an unexpected keyword argument 'arg'
>>> 

所以其实这里传入的fine_tune的参数就是按照关键字传入,等同于:

with slim.arg_scope([slim.batch_norm], **batch_norm_params)

with slim.arg_scope([slim.batch_norm], is_training=is_training and fine_tune_batch_norm, decay=0.9997, epsilon=1e-5, scale=True)

这些参数arg_scope有所少收多少,不加以区分,因为他根本就不读,看都不看就全都扔给下面的具体操作了。

操作

首先根据使用的网络名字(比如xception_65,mobilenet_v2等)获取该网络最后一层的输出。然后与feature_extractor得到的tensor结合。

 

直接获取encode的输入,不使用aspp,是下图DCNN到Decoder的输入部分。

居然不能随便在图片上画,这个编辑器不好使,我要找一个好使的。凑合看吧。

deeplabv3+ decoder代码 详解

获取DCNN tensor用来细化分割结果

代码

          feature_list = feature_extractor.networks_to_feature_maps[
              model_variant][
                  feature_extractor.DECODER_END_POINTS][output_stride]

上面的代码实际上就是去字典里取字符串。所以feature_list:<class 'list'>取值就是:

['entry_flow/block2/unit_1/xception_module/separable_conv2_pointwise']

feature_extractor.DECODER_END_POINTS = 'decoder_end_points'

根据目前网络,xception65,得到aspp操作之前的tensor的名字:

              feature_name = '{}/{}'.format(
                  feature_extractor.name_scope[model_variant], name)

feature_name='xception_65/entry_flow/block2/unit_1/xception_module/separable_conv2_pointwise'

通过下面代码可以根据名字获取tensor:

            decoder_features_list.append(
                slim.conv2d(
                    end_points[feature_name],
                    48,
                    1,
                    scope='feature_projection' + str(i) + scope_suffix))

上面end_points[feature_name]返回的就是DCNN输出的tensor。

然后看下这个tensor的shape,卷积之前是(129,129,256).根据上面的代码卷积之后肯定是(129,129,48)

Tensor("xception_65/entry_flow/block2/unit_1/xception_module/separable_conv2_pointwise/BatchNorm/FusedBatchNorm:0", shape=(?, 129, 129, 256), dtype=float32, device=/device:GPU:0)

因为:

deeplabv3+ decoder代码 详解

取得就是1/4的特征图来细化的。

 

合并两个tensor

将上述DCNN得到的特征成为细化张量。回顾:(129,129,256)后来被卷积了,卷积核大小是1,一共48个,所以(129,129,48)

将ASPP后得到的最终的特征称为语义张量。回顾:(33,33,256)后来被上采样了(129,129,256)

要将上面两个特征结合。利用的是utils.split_separate_conv2d.

分别卷积是一个通用的方法。目前很流行,所以,就不讲了。直接看代码。

先介绍分开卷积的函数:


def split_separable_conv2d(inputs,
                           filters,
                           kernel_size=3,
                           rate=1,
                           weight_decay=0.00004,
                           depthwise_weights_initializer_stddev=0.33,
                           pointwise_weights_initializer_stddev=0.06,
                           scope=None):
  """Splits a separable conv2d into depthwise and pointwise conv2d.

  This operation differs from `tf.layers.separable_conv2d` as this operation
  applies activation function between depthwise and pointwise conv2d.

  Args:
    inputs: Input tensor with shape [batch, height, width, channels].
    filters: Number of filters in the 1x1 pointwise convolution.
    kernel_size: A list of length 2: [kernel_height, kernel_width] of
      of the filters. Can be an int if both values are the same.
    rate: Atrous convolution rate for the depthwise convolution.
    weight_decay: The weight decay to use for regularizing the model.
    depthwise_weights_initializer_stddev: The standard deviation of the
      truncated normal weight initializer for depthwise convolution.
    pointwise_weights_initializer_stddev: The standard deviation of the
      truncated normal weight initializer for pointwise convolution.
    scope: Optional scope for the operation.

  Returns:
    Computed features after split separable conv2d.
  """

注意这里默认卷积核大小是3.

再看细化张量与语义张量合并卷积过程:

            decoder_depth = 256
            if decoder_use_separable_conv:
              decoder_features = split_separable_conv2d(
                  tf.concat(decoder_features_list, 3),
                  filters=decoder_depth,
                  rate=1,
                  weight_decay=weight_decay,
                  scope='decoder_conv0' + scope_suffix)
              decoder_features = split_separable_conv2d(
                  decoder_features,
                  filters=decoder_depth,
                  rate=1,
                  weight_decay=weight_decay,
                  scope='decoder_conv1' + scope_suffix)

首先是在第三个维度上concatenate了两个张量:

Tensor("decoder/concat:0", shape=(?, 129, 129, 304), dtype=float32, device=/device:GPU:0)

经过第一次split卷积得到:

Tensor("decoder/decoder_conv0_pointwise/Relu:0", shape=(?, 129, 129, 256), dtype=float32, device=/device:GPU:0)

再经过一次一样的256通道,3*3卷积核的split卷积得到:

Tensor("decoder/decoder_conv1_pointwise/Relu:0", shape=(?, 129, 129, 256), dtype=float32, device=/device:GPU:0)

就可以返回了,于是decode过程就结束了。

 

 

 

上一篇:论文阅读 || 语义分割系列 —— deeplabv3+ 详解


下一篇:DeepLabV3+语义分割实战