论文中的decoder特点
- 两个输入
- 分了两次上采样,存在1/16与1/4特征图。
看代码model.py: refine_by_decoder
看代码定义与注释:
def refine_by_decoder(features,
end_points,
crop_size=None,
decoder_output_stride=None,
decoder_use_separable_conv=False,
model_variant=None,
weight_decay=0.0001,
reuse=None,
is_training=False,
fine_tune_batch_norm=False,
use_bounded_activation=False):
"""Adds the decoder to obtain sharper segmentation results.
Args:
features: A tensor of size [batch, features_height, features_width,
features_channels].
end_points: A dictionary from components of the network to the corresponding
activation.
crop_size: A tuple [crop_height, crop_width] specifying whole patch crop
size.
decoder_output_stride: A list of integers specifying the output stride of
low-level features used in the decoder module.
decoder_use_separable_conv: Employ separable convolution for decoder or not.
model_variant: Model variant for feature extraction.
weight_decay: The weight decay for model variables.
reuse: Reuse the model variables or not.
is_training: Is training or not.
fine_tune_batch_norm: Fine-tune the batch norm parameters or not.
use_bounded_activation: Whether or not to use bounded activations. Bounded
activations better lend themselves to quantized inference.
Returns:
Decoder output with size [batch, decoder_height, decoder_width,
decoder_channels].
Raises:
ValueError: If crop_size is None.
"""
我给出输入数据的具体样子。
函数部分输入参数解读:
decoder_output_stride :意味着我decode后是1/4,也就是(129,129,channels)大小的tensor
end_point :是一个字典,使用方法tensor_features=end_point[feature_name]就可以取出这个feature
features :原始图片是512大小,所以这里的feature输入是1/16上采样。
fine_tune_batch_norm :是否精调BN(batch normalization)参数
BN (batch normalization)配置:
batch_norm_params = {
'is_training': is_training and fine_tune_batch_norm,
'decay': 0.9997,
'epsilon': 1e-5,
'scale': True,
}
之后配置了所有的decoder的参数,这意味着,所有的decode操作都要服从下面代码:
slim.arg_scope与variable_scope教程,都很简单很短。
with slim.arg_scope(
[slim.conv2d, slim.separable_conv2d],
weights_regularizer=slim.l2_regularizer(weight_decay),
activation_fn=tf.nn.relu6 if use_bounded_activation else tf.nn.relu,
normalizer_fn=slim.batch_norm,
padding='SAME',
stride=1,
reuse=reuse):
with slim.arg_scope([slim.batch_norm], **batch_norm_params):
with tf.variable_scope(DECODER_SCOPE, DECODER_SCOPE, [features]):
另外插播一段传递**参数的教程说明:
>>> def test_args_kwargs(arg1, arg2, arg3):
... print("arg1:", arg1)
... print("arg2:", arg2)
... print("arg3:", arg3)
...
>>> kwargs = {"arg3": 3, "arg2": "two", "arg1": 5}
>>> test_args_kwargs(**kwargs)
arg1: 5
arg2: two
arg3: 3
>>> kwargs = {"arg3": 3, "arg2": "two", "arg": 5}
>>> test_args_kwargs(**kwargs)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: test_args_kwargs() got an unexpected keyword argument 'arg'
>>>
所以其实这里传入的fine_tune的参数就是按照关键字传入,等同于:
with slim.arg_scope([slim.batch_norm], **batch_norm_params)
with slim.arg_scope([slim.batch_norm], is_training=is_training and fine_tune_batch_norm, decay=0.9997, epsilon=1e-5, scale=True)
这些参数arg_scope有所少收多少,不加以区分,因为他根本就不读,看都不看就全都扔给下面的具体操作了。
操作
首先根据使用的网络名字(比如xception_65,mobilenet_v2等)获取该网络最后一层的输出。然后与feature_extractor得到的tensor结合。
直接获取encode的输入,不使用aspp,是下图DCNN到Decoder的输入部分。
居然不能随便在图片上画,这个编辑器不好使,我要找一个好使的。凑合看吧。
获取DCNN tensor用来细化分割结果
代码
feature_list = feature_extractor.networks_to_feature_maps[
model_variant][
feature_extractor.DECODER_END_POINTS][output_stride]
上面的代码实际上就是去字典里取字符串。所以feature_list:<class 'list'>取值就是: ['entry_flow/block2/unit_1/xception_module/separable_conv2_pointwise'] feature_extractor.DECODER_END_POINTS = 'decoder_end_points'
根据目前网络,xception65,得到aspp操作之前的tensor的名字:
feature_name = '{}/{}'.format(
feature_extractor.name_scope[model_variant], name)
feature_name='xception_65/entry_flow/block2/unit_1/xception_module/separable_conv2_pointwise'
通过下面代码可以根据名字获取tensor:
decoder_features_list.append(
slim.conv2d(
end_points[feature_name],
48,
1,
scope='feature_projection' + str(i) + scope_suffix))
上面end_points[feature_name]返回的就是DCNN输出的tensor。
然后看下这个tensor的shape,卷积之前是(129,129,256).根据上面的代码卷积之后肯定是(129,129,48)
Tensor("xception_65/entry_flow/block2/unit_1/xception_module/separable_conv2_pointwise/BatchNorm/FusedBatchNorm:0", shape=(?, 129, 129, 256), dtype=float32, device=/device:GPU:0)
因为:
取得就是1/4的特征图来细化的。
合并两个tensor
将上述DCNN得到的特征成为细化张量。回顾:(129,129,256)后来被卷积了,卷积核大小是1,一共48个,所以(129,129,48)
将ASPP后得到的最终的特征称为语义张量。回顾:(33,33,256)后来被上采样了(129,129,256)
要将上面两个特征结合。利用的是utils.split_separate_conv2d.
分别卷积是一个通用的方法。目前很流行,所以,就不讲了。直接看代码。
先介绍分开卷积的函数:
def split_separable_conv2d(inputs,
filters,
kernel_size=3,
rate=1,
weight_decay=0.00004,
depthwise_weights_initializer_stddev=0.33,
pointwise_weights_initializer_stddev=0.06,
scope=None):
"""Splits a separable conv2d into depthwise and pointwise conv2d.
This operation differs from `tf.layers.separable_conv2d` as this operation
applies activation function between depthwise and pointwise conv2d.
Args:
inputs: Input tensor with shape [batch, height, width, channels].
filters: Number of filters in the 1x1 pointwise convolution.
kernel_size: A list of length 2: [kernel_height, kernel_width] of
of the filters. Can be an int if both values are the same.
rate: Atrous convolution rate for the depthwise convolution.
weight_decay: The weight decay to use for regularizing the model.
depthwise_weights_initializer_stddev: The standard deviation of the
truncated normal weight initializer for depthwise convolution.
pointwise_weights_initializer_stddev: The standard deviation of the
truncated normal weight initializer for pointwise convolution.
scope: Optional scope for the operation.
Returns:
Computed features after split separable conv2d.
"""
注意这里默认卷积核大小是3.
再看细化张量与语义张量合并卷积过程:
decoder_depth = 256
if decoder_use_separable_conv:
decoder_features = split_separable_conv2d(
tf.concat(decoder_features_list, 3),
filters=decoder_depth,
rate=1,
weight_decay=weight_decay,
scope='decoder_conv0' + scope_suffix)
decoder_features = split_separable_conv2d(
decoder_features,
filters=decoder_depth,
rate=1,
weight_decay=weight_decay,
scope='decoder_conv1' + scope_suffix)
首先是在第三个维度上concatenate了两个张量:
Tensor("decoder/concat:0", shape=(?, 129, 129, 304), dtype=float32, device=/device:GPU:0)
经过第一次split卷积得到:
Tensor("decoder/decoder_conv0_pointwise/Relu:0", shape=(?, 129, 129, 256), dtype=float32, device=/device:GPU:0)
再经过一次一样的256通道,3*3卷积核的split卷积得到:
Tensor("decoder/decoder_conv1_pointwise/Relu:0", shape=(?, 129, 129, 256), dtype=float32, device=/device:GPU:0)
就可以返回了,于是decode过程就结束了。