Tensorflow学习：ResNet代码（详细剖析）

2024-03-21 22:21:04

参考链接：

感谢此位博主的工作，本博主只做进一步的剖析，目的为掌握和具备二次开发能力。
http://blog.csdn.net/superman_xxx/article/details/65452735

先贴代码：

先贴代码：
# -*- coding: utf-8 -*-
"""
Created on Thu Aug 17 16:24:55 2017
Project: Residual Neural Network
E-mail: Eric2014_Lv@sjtu.edu.cn
Reference: 《Tensorflow实战》P143-P156
@author: DidiLv
"""

"""

Typical use:

from tensorflow.contrib.slim.nets import resnet_v2

ResNet-101 for image classification into 1000 classes:

# inputs has shape [batch, 224, 224, 3]
with slim.arg_scope(resnet_v2.resnet_arg_scope(is_training)):
net, end_points = resnet_v2.resnet_v2_101(inputs, 1000)

ResNet-101 for semantic segmentation into 21 classes:

# inputs has shape [batch, 513, 513, 3]
with slim.arg_scope(resnet_v2.resnet_arg_scope(is_training)):
net, end_points = resnet_v2.resnet_v2_101(inputs,
21,
global_pool=False,
output_stride=16)
"""
import collections # 原生的collections库
import tensorflow as tf
slim = tf.contrib.slim # 使用方便的contrib.slim库来辅助创建ResNet

# 这里值得注意的是只有定义了类。其他的并没有定义，从空格发现。这也就是书中所说的“只包含数据结构，不包含具体方法”
class Block(collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])):
'''
使用collections.namedtuple设计ResNet基本模块组的name tuple，并用它创建Block的类
只包含数据结构，不包含具体方法。
定义一个典型的Block，需要输入三个参数：
scope：Block的名称
unit_fn：ResNet V2中的残差学习单元
args：Block的args。
'''
#图片解释：图例1

########定义一个降采样的方法########
def subsample(inputs, factor, scope=None):
"""Subsamples the input along the spatial dimensions.
Args:
inputs: A `Tensor` of size [batch, height_in, width_in, channels].
factor: The subsampling factor.（采样因子或采样率）
scope: Optional variable_scope.

Returns:
output: 如果factor为1，则不做修改直接返回inputs；如果不为1，则使用
slim.max_pool2d最大池化来实现，通过1*1的池化尺寸，stride作步长，实
现降采样。
"""
if factor == 1:
return inputs
else:
return slim.max_pool2d(inputs, [1, 1], stride=factor, scope=scope)

########创建卷积层########
def conv2d_same(inputs, num_outputs, kernel_size, stride, scope=None):
"""
Args:
inputs: A 4-D tensor of size [batch, height_in, width_in, channels].
num_outputs: An integer, the number of output filters.
kernel_size: An int with the kernel_size of the filters.
stride: An integer, the output stride.
rate: An integer, rate for atrous convolution.
scope: Scope.

Returns:
output: A 4-D tensor of size [batch, height_out, width_out, channels] with
the convolution output.
"""
if stride == 1:
return slim.conv2d(inputs, num_outputs, kernel_size, stride=1,
padding='SAME', scope=scope)
else:
pad_total = kernel_size - 1
pad_beg = pad_total // 2
pad_end = pad_total - pad_beg
inputs = tf.pad(inputs, # 对输入变量进行补零操作
[[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]])
# 因为已经进行了zero padding，所以只需再使用一个padding模式为VALID的slim.conv2d创建这个卷积层
# 详细解释请见图二
return slim.conv2d(inputs, num_outputs, kernel_size, stride=stride,
padding='VALID', scope=scope)

########定义堆叠Blocks的函数########
@slim.add_arg_scope
def stack_blocks_dense(net, blocks,
outputs_collections=None):
"""
Args:
net: A `Tensor` of size [batch, height, width, channels].输入。
blocks: 是之前定义的Block的class的列表。
outputs_collections: 收集各个end_points的collections。

Returns:
net: Output tensor

"""
# 使用两层循环，逐个Residual Unit地堆叠
for block in blocks: # 先使用两个tf.variable_scope将残差学习单元命名为block1/unit_1的形式
with tf.variable_scope(block.scope, 'block', [net]) as sc:
for i, unit in enumerate(block.args):

with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
# 在第2层循环中，我们拿到每个block中每个Residual Unit的args并展开为下面四个参数
unit_depth, unit_depth_bottleneck, unit_stride = unit
net = block.unit_fn(net, # 使用残差学习单元的生成函数顺序的创建并连接所有的残差学习单元
depth=unit_depth,
depth_bottleneck=unit_depth_bottleneck,
stride=unit_stride)
net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net) # 将输出net添加到collections中

return net # 当所有block中的所有Residual Unit都堆叠完成之后，再返回最后的net作为stack_blocks_dense

# 创建ResNet通用的arg_scope,arg_scope用来定义某些函数的参数默认值
def resnet_arg_scope(is_training=True, # 训练标记
weight_decay=0.0001, # 权重衰减速率
batch_norm_decay=0.997, # BN的衰减速率
batch_norm_epsilon=1e-5, # BN的epsilon默认1e-5
batch_norm_scale=True): # BN的scale默认值

batch_norm_params = { # 定义batch normalization（标准化）的参数字典
'is_training': is_training,
'decay': batch_norm_decay,
'epsilon': batch_norm_epsilon,
'scale': batch_norm_scale,
'updates_collections': tf.GraphKeys.UPDATE_OPS,
}

with slim.arg_scope( # 通过slim.arg_scope将[slim.conv2d]的几个默认参数设置好
[slim.conv2d],
weights_regularizer=slim.l2_regularizer(weight_decay), # 权重正则器设置为L2正则
weights_initializer=slim.variance_scaling_initializer(), # 权重初始化器
activation_fn=tf.nn.relu, # 激活函数
normalizer_fn=slim.batch_norm, # 标准化器设置为BN
normalizer_params=batch_norm_params):
with slim.arg_scope([slim.batch_norm], **batch_norm_params):
with slim.arg_scope([slim.max_pool2d], padding='SAME') as arg_sc: # ResNet原论文是VALID模式，SAME模式可让特征对齐更简单
return arg_sc # 最后将基层嵌套的arg_scope作为结果返回

# 定义核心的bottleneck残差学习单元
@slim.add_arg_scope
def bottleneck(inputs, depth, depth_bottleneck, stride,
outputs_collections=None, scope=None):
"""
Args:
inputs: A tensor of size [batch, height, width, channels].
depth、depth_bottleneck:、stride三个参数是前面blocks类中的args
rate: An integer, rate for atrous convolution.
outputs_collections: 是收集end_points的collection
scope: 是这个unit的名称。
"""
with tf.variable_scope(scope, 'bottleneck_v2', [inputs]) as sc: # slim.utils.last_dimension获取输入的最后一个维度，即输出通道数。
depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4) # 可以限定最少为四个维度
# 使用slim.batch_norm对输入进行batch normalization，并使用relu函数进行预激活preactivate
preact = slim.batch_norm(inputs, activation_fn=tf.nn.relu, scope='preact')

if depth == depth_in:
shortcut = subsample(inputs, stride, 'shortcut')
# 如果残差单元的输入通道数和输出通道数一致，那么按步长对inputs进行降采样
else:
shortcut = slim.conv2d(preact, depth, [1, 1], stride=stride,
normalizer_fn=None, activation_fn=None,
scope='shortcut')
# 如果不一样就按步长和1*1的卷积改变其通道数，使得输入、输出通道数一致

# 先是一个1*1尺寸，步长1，输出通道数为depth_bottleneck的卷积
residual = slim.conv2d(preact, depth_bottleneck, [1, 1], stride=1,
scope='conv1')
# 然后是3*3尺寸，步长为stride，输出通道数为depth_bottleneck的卷积
residual = conv2d_same(residual, depth_bottleneck, 3, stride,
scope='conv2')
# 最后是1*1卷积，步长1，输出通道数depth的卷积，得到最终的residual。最后一层没有正则项也没有激活函数
residual = slim.conv2d(residual, depth, [1, 1], stride=1,
normalizer_fn=None, activation_fn=None,
scope='conv3')

output = shortcut + residual # 将降采样的结果和residual相加

return slim.utils.collect_named_outputs(outputs_collections, # 将output添加进collection并返回output作为函数结果
sc.name,
output)

########定义生成resnet_v2的主函数########
def resnet_v2(inputs, # A tensor of size [batch, height_in, width_in, channels].输入
blocks, # 定义好的Block类的列表
num_classes=None, # 最后输出的类数
global_pool=True, # 是否加上最后的一层全局平均池化
include_root_block=True, # 是否加上ResNet网络最前面通常使用的7*7卷积和最大池化
reuse=None, # 是否重用
scope=None): # 整个网络的名称
# 在函数体先定义好variable_scope和end_points_collection
with tf.variable_scope(scope, 'resnet_v2', [inputs], reuse=reuse) as sc:
end_points_collection = sc.original_name_scope + '_end_points' # 定义end_points_collection
with slim.arg_scope([slim.conv2d, bottleneck,
stack_blocks_dense],
outputs_collections=end_points_collection): # 将三个参数的outputs_collections默认设置为end_points_collection

net = inputs
if include_root_block: # 根据标记值
with slim.arg_scope([slim.conv2d],
activation_fn=None, normalizer_fn=None):
net = conv2d_same(net, 64, 7, stride=2, scope='conv1') # 创建resnet最前面的64输出通道的步长为2的7*7卷积
net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1') # 然后接最大池化
# 经历过两个步长为2的层图片缩为1/4
net = stack_blocks_dense(net, blocks) # 将残差学习模块组生成好
net = slim.batch_norm(net, activation_fn=tf.nn.relu, scope='postnorm')

if global_pool: # 根据标记添加全局平均池化层
net = tf.reduce_mean(net, [1, 2], name='pool5', keep_dims=True) # tf.reduce_mean实现全局平均池化效率比avg_pool高
if num_classes is not None: # 是否有通道数
net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, # 无激活函数和正则项
normalizer_fn=None, scope='logits') # 添加一个输出通道num_classes的1*1的卷积
end_points = slim.utils.convert_collection_to_dict(end_points_collection) # 将collection转化为python的dict
if num_classes is not None:
end_points['predictions'] = slim.softmax(net, scope='predictions') # 输出网络结果
return net, end_points
#------------------------------ResNet的生成函数定义好了----------------------------------------

def resnet_v2_50(inputs, # 图像尺寸缩小了32倍
num_classes=None,
global_pool=True,
reuse=None, # 是否重用
scope='resnet_v2_50'):
blocks = [
Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),

# Args:：
# 'block1'：Block名称（或scope）
# bottleneck：ResNet V2残差学习单元
# [(256, 64, 1)] * 2 + [(256, 64, 2)]：Block的Args，Args是一个列表。其中每个元素都对应一个bottleneck
# 前两个元素都是(256, 64, 1)，最后一个是(256, 64, 2）。每个元素
# 都是一个三元tuple，即（depth，depth_bottleneck，stride）。
# (256, 64, 3)代表构建的bottleneck残差学习单元（每个残差学习单元包含三个卷积层）中，第三层输出通道数
# depth为256，前两层输出通道数depth_bottleneck为64，且中间那层步长3。这个残差学习单元结构为：
# [(1*1/s1,64),(3*3/s2,64),(1*1/s1,256)]

Block(
'block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
Block(
'block3', bottleneck, [(1024, 256, 1)] * 5 + [(1024, 256, 2)]),
Block(
'block4', bottleneck, [(2048, 512, 1)] * 3)]
return resnet_v2(inputs, blocks, num_classes, global_pool,
include_root_block=True, reuse=reuse, scope=scope)

def resnet_v2_101(inputs, # unit提升的主要场所是block3
num_classes=None,
global_pool=True,
reuse=None,
scope='resnet_v2_101'):
"""ResNet-101 model of [1]. See resnet_v2() for arg and return description."""
blocks = [
Block(
'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
Block(
'block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
Block(
'block3', bottleneck, [(1024, 256, 1)] * 22 + [(1024, 256, 2)]),
Block(
'block4', bottleneck, [(2048, 512, 1)] * 3)]
return resnet_v2(inputs, blocks, num_classes, global_pool,
include_root_block=True, reuse=reuse, scope=scope)

def resnet_v2_152(inputs, # unit提升的主要场所是block3
num_classes=None,
global_pool=True,
reuse=None,
scope='resnet_v2_152'):
"""ResNet-152 model of [1]. See resnet_v2() for arg and return description."""
blocks = [
Block(
'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
Block(
'block2', bottleneck, [(512, 128, 1)] * 7 + [(512, 128, 2)]),
Block(
'block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
Block(
'block4', bottleneck, [(2048, 512, 1)] * 3)]
return resnet_v2(inputs, blocks, num_classes, global_pool,
include_root_block=True, reuse=reuse, scope=scope)

def resnet_v2_200(inputs, # unit提升的主要场所是block2
num_classes=None,
global_pool=True,
reuse=None,
scope='resnet_v2_200'):
"""ResNet-200 model of [2]. See resnet_v2() for arg and return description."""
blocks = [
Block(
'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
Block(
'block2', bottleneck, [(512, 128, 1)] * 23 + [(512, 128, 2)]),
Block(
'block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
Block(
'block4', bottleneck, [(2048, 512, 1)] * 3)]
return resnet_v2(inputs, blocks, num_classes, global_pool,
include_root_block=True, reuse=reuse, scope=scope)

from datetime import datetime
import math
import time

#-------------------评测函数---------------------------------
# 测试152层深的ResNet的forward性能
def time_tensorflow_run(session, target, info_string):
num_steps_burn_in = 10
total_duration = 0.0
total_duration_squared = 0.0
for i in range(num_batches + num_steps_burn_in):
start_time = time.time()
_ = session.run(target)
duration = time.time() - start_time
if i >= num_steps_burn_in:
if not i % 10:
print ('%s: step %d, duration = %.3f' %
(datetime.now(), i - num_steps_burn_in, duration))
total_duration += duration
total_duration_squared += duration * duration
mn = total_duration / num_batches
vr = total_duration_squared / num_batches - mn * mn
sd = math.sqrt(vr)
print ('%s: %s across %d steps, %.3f +/- %.3f sec / batch' %
(datetime.now(), info_string, num_batches, mn, sd))

batch_size = 32
height, width = 224, 224
inputs = tf.random_uniform((batch_size, height, width, 3))
with slim.arg_scope(resnet_arg_scope(is_training=False)): # is_training设置为false
net, end_points = resnet_v2_152(inputs, 1000)

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
num_batches=100
time_tensorflow_run(sess, net, "Forward")

# forward计算耗时相比VGGNet和Inception V3大概只增加了50%，是一个实用的卷积神经网络。

图例解释：
图例1（Block）：

图例二（padding方式介绍）：

由于在代码中已经做了零填充，所以直接用VALID方式进行填充。

该博客主要以TensorFlow提供的ResNet代码为主，但是我并不想把它称之为代码解析，因为代码和方法，实践和理论总是缺一不可。
github地址，其中：

resnet_model.py为残差网络模型的实现，包括残差模块，正则化，批次归一化，优化策略等等；

resnet_main.py为主函数，主要定义了测试、训练、总结、打印的代码和一些参数。

cifar_input.py为数据准备函数，主要把cifar提供的bin数据解码为图片tensor，并组合batch

为了保证行号的一致性，下面的内容如果涉及到行号的话，均以github上的为准，同时为了节省篇幅，下面如果出现代码将去掉注释，建议在阅读本博客是同时打开github网址，因为下面的内容并没有多少代码。

既然是在说残差模型，那么当然就要说resnet_model.py这个代码，整个代码就是在声明一个类——ResNet：

第38行到55行：

class ResNet(object):

def __init__(self, hps, images, labels, mode):
self.hps = hps
self._images = images
self.labels = labels
self.mode = mode

self._extra_train_ops = []

上面是构造函数在初始化对象时的四个参数，实例化对象时也就完成初始化，参数赋值给类中的数据成员，其中self._images为私有成员。此外又定义了一个新的私有数组成员：self._extra_train_ops用来执行滑动平均操作。

构造函数的参数有hps，images，labels，mode。

hps在resnet_main.py在初始化的：

hps = resnet_model.HParams(batch_size=batch_size,
num_classes=num_classes,
min_lrn_rate=0.0001,
lrn_rate=0.1,
num_residual_units=5,
use_bottleneck=False,
weight_decay_rate=0.0002,
relu_leakiness=0.1,
optimizer='mom')

其中的HParams字典在resnet_mode.py的32行定义，变量的意义分别是：

HParams = namedtuple('HParams',
'一个batch内的图片个数',
'分类任务数目',
'最小的学习率',
'学习率',
'一个残差组内残差单元数量',
'是否使用bottleneck',
'relu泄漏',
'优化策略')

images和labels是cifar_input返回回来的值（115行），注意这里的值已经是batch了，毕竟image和label都加了复数。
mode决定是训练还是测试，它在resnet_main.py中定义（29行）并初始化（206行）。

除了__init__的构造函数外，类下还定义了12个函数，把残差模型构建中用到功能模块化了，12个函数貌似很多的样子，但是都是一些很简单的功能，甚至有一些只有一行代码（比如可以看下65行），之所有单拉出来是因为功能是独立的，或者反复出现，TensorFlow提供的代码还是非常规范和正规的！

按照自上而下的顺序依次是：

build_graph(self):
构建TensorFlow的graph

_stride_arr(self, stride):
定义卷积操作中的步长

_build_model(self):
构建残差模型

_build_train_op(self):
构建训练优化策略

_batch_norm(self, name, x):
批次归一化操作

_residual(self, x, in_filter, out_filter, stride,activate_before_residual=False):
不带bottleneck的残差模块，或者也可以叫做残差单元，总之注意不是残差组！

_bottleneck_residual(self, x, in_filter, out_filter, stride,activate_before_residual=False):
带bottleneck的残差模块

decay(self):
L2正则化

_conv(self, name, x, filter_size, in_filters, out_filters, strides):
卷积操作

_relu(self, x, leakiness=0.0):
激活操作

_fully_connected(self, x, out_dim):
全链接

_global_avg_pool(self, x, out_dim):
全局池化

注意：
1.在代码里这12个函数是并列的，但是讲道理的话它们并不平级（有一些函数在调用另一些）。比如卷积，激活，步长设置之类肯定是被调用的。而有三个函数比较重要，分别是：build_graph(self):、_build_model(self):、_build_train_op(self):。第一个是由于TensorFlow就是在维护一张图，所有的数据以tensor的形式在图上流动；第二个决定了残差模型；第三个决定了优化策略。

2.个人认为_stride_arr(self, stride):函数不应该出现在该位置（65行），如果把它放后面，前三个函数就分别是构件图，构建模型，构建优化策略。这样逻辑上就很清晰。

3.这套代码没有常规的池化操作，一方面是因为RenNet本身就用步长为2的卷积取代池化，但是在进入残差组之前还是应该有一个常规池化的，只是这个代码没有。

4.这个代码有一个很不讲理的地方，第一层卷积用了3*3的核，不是7*7，也不是3个3*3（73行）

5.这套代码使用的是bin封装的cifar数据，所以要想改成自己的数据集需要把input的部分换掉。

6.这套代码没有设终止条件，会一直训练/测试，直到手动停止。

到这里代码的结构起码说清楚了，带着上面的注意事项，我们就可以看代码。
图构建没什么好说的，我们直接进入_build_model(self)好了（69行）：
71-73行定义残差网络的第一个卷积层
。
75-82行使用哪种残差单元（带bottleneck还是不带bottleneck），并分别对两种情况定义了残差组中的特征通道数。

90-109行构建了三个残差组，每个组内有4个单元，这个数量是由hps参数决定的。

111-124行是残差组结束后模型剩余的部分（池化+全连接+softmax+loss function+L2），这已经和残差网络的特性没什么关系了，每个卷积神经网络差不多都是这样子。

126行将损失函数计算出的cost加入summary。

所以残差模型最关键的东西，最能表征残差特性的东西，都在90-109行，当然这十几行里是调用了其他函数的。这个本文的最后后再说，下面为保证代码部分的连贯性，先往下说_build_train_op(self)（128行）：

130-131行获取学习率并加入到summary。

133-134行根据cost与权系数计算梯度。

136-136行选择使用随机梯度下降还是带动量梯度下降。

141-143行执行梯度下降优化。

145行将梯度下降优化操作与bn操作合并（带op的变量是一种操作）。

146行得到最后的结果，在这里定义了一个新的数组成员：self.train_op，而这个变量最终被用到了resnet_main.py中（113行）：

while not mon_sess.should_stop():
mon_sess.run(model.train_op)
1
2
如果没有达到终止条件的话，代码将一直执行优化操作，model是类实例化出来的一个对象，在resnet_main.py中的model和在resnet_model.py中的self是一个东西。

到这里重要的代码就都说完了，最后说回残差网络最核心的东西：两种残差单元。
残差网络的结构非常简单，就是不断的通过一组一组的残差组链接，这是一个Resnet50的结构图，不同的网络结构在不同的组之间会有不同数目的残差模块，如下图：

举个例子，比如resnet50中，2-5组中分别有3，4，6，3个残差模块。

朴素残差模块（不带bottleneck）：

左侧为正常了两个卷积层，而右侧在两个卷积层前后做了直连，这个直连解释残差，左侧的输出为H(x)=F(x)，而加入直连后的H(x)=F(x)+x，一个很简单的改进，但是取得了非常优异的效果。
至于为什么直连要跨越两个卷积层，而不是一个？这个是实验验证的结果，在一个卷积层上加直连性能并没有太大提升。

bottleneck残差模块：
bottleneck残差模块让残差网络可以向更深的方向上走，原因就是因为同一通道数的情况下，bottleneck残差模块要比朴素残差模块节省大量的参数，一个单元内的参数少了，对应的就可以做出更深的结构。

上面这样图能够说明二者的区别，左侧的通道数是64（它常出现在50层内的残差结构中），右侧的通道数是256（常出现在50层以上的残差结构中），从右面的图可以看到，bottleneck残差模块将两个3*3换成了1*1，3*3，1*1的形式，第一个1*1用来降通道，3*3用来在降通道的特征上卷积，第二个1*1用于升通道。而参数的减少就是因为在第一个1*1将通道数降了下来。我们可以举一个例子验证一下：

假设朴素残差模块与bottleneck残差模块通道数都是256，那么：

朴素残差模块的参数个数：
3*3*256*256+3*3*256*256 = 10616832
bottleneck残差模块的参数个数：
1*1*256*64+3*3*64*64+1*1*64*256 = 69632
可以看到，参数的减少非常明显。

再回到上面的图：

Resnet34余Resnet50层每一组中的模块个数并没有变化，层数的上升是因为以前两个卷积层变成了3个，前者的参数为3.6亿，后者参数为3.8亿。这样来看的话参数为什么反而多了？这是因为组内的通道数发生了变化，前者各组通道数为[64，128，256，512]，而后者的各组通道数为[256，512，1024，2048]。这也是残差网络在设计时的一个特点，使用bottleneck残差模块时，组内的通道数要明显高于使用朴素残差模块。

TensorFlow提供的代码也是这样，可以看下77行：

if self.hps.use_bottleneck:
res_func = self._bottleneck_residual
filters = [16, 64, 128, 256]
else:
res_func = self._residual
filters = [16, 16, 32, 64]

通过上面的理论说明，就可以再回头看下代码中的:_residual()函数和_bottleneck_residual()函数了。

码农公寓

相关文章