深度剖析卷积神经网络

2022-05-22 06:06:22

最先进的图像识别体系结构采用了许多附加组件来补充卷积操作。在这篇文章中，你将了解到一些能够提高现代卷积神经网络速度和精度的重要的组件。

使CNN非常高效的第一个秘密就是Pooling。Pooling是像卷积一样，用于对图像的每个局部区域标量变换的一种矢量。与卷积不同的是它们没有filters，也不用局部区域计算点积，而是计算平均值（Average Pooling）中的像素，或者只是选择强度最高的像素并丢弃剩余的像素（Max Pooling）。

Max Pooling近年来效果最好，它以某个地区的最大像素代表该地区最重要的特征为理论基础。通常我们希望分类的物体图像可能包含许多其他物体，例如，出现在汽车图像某处的猫可能会误导分类器，Pooling有助于缓解这种影响。

同时，它也大大降低了计算成本。通常，网络中每层的图像大小与每层的计算成本（触发器）成正比。分段卷积有时用作pooling的代替物,随着图层变得更深，Pooling会减少图像的尺寸，因此，它有助于防止网络需要的触发器数量激增。

Dropout

过度拟合是指由于过度依赖训练集中的某些特定功能导致的网络在训练集上运行良好，但在测试集上表现不佳的一种现象。Dropout是一种对抗过度拟合的技术。它可以随机地将一些激活值设置为0，迫使网络探索更多的方式来分类图像，而不是过度依赖一些功能。它也是AlesNet中的关键元素之一。

Batch Normalization

神经网络的一个主要问题是梯度消失。来自Google Brain的Ioffe和Szegedy发现，这主要是由于内部协变量变化导致的信息通过网络传播而引起的的变化数据分布的现象。他们所做的是称为Batch Normalization的技术，通过将每批图像标准化为具有零均值和单位差异来工作。

它通常放置在cnns的非线性（relu）之前。它极大地提高了准确性，同时极大地加快了训练过程。

数据增强

人类视觉系统在适应图像平移，旋转和其他形式的扭曲方面非常出色，拍摄图像并翻转它，大多数人仍然可以识别它。然而，covnets并不善于处理这种扭曲，他们可能会由于小的翻转而失败。但是通过随机扭曲图像训练，使用水平翻转，垂直翻转，旋转，移位和其他扭曲，将会使covnets学会如何处理这种扭曲。

另一种常用方法是从每幅图像中减去平均图像，然后除以标准差。

接下来将解释如何理由Keras来实现它们。

在这篇文章中，所有的实验都将在一个包含60000个32*32GB图像的CIFAR10数据集上进行。它分为50000个训练图像和10000个测试图像，为了让事情更加模块化，我们为每个图层创建一个简单的函数：

def Unit(x,filters):
    out = BatchNormalization()(x)
    out = Activation("relu")(out)
    out = Conv2D(filters=filters, kernel_size=[3, 3], strides=[1, 1], padding="same")(out)
    return out

这里是我们的代码最重要的方面，单元函数定义了一个简单的图层，其中包含三个层次，第一个是我先前解释的Batch Normalization，接下来我们添加RELU激活，最后添加卷积，注意我是怎样用“预激活”的方式把RELU放在conv之前的。

现在我们将把这些单位层组合成一个模型：

def MiniModel(input_shape):
    images = Input(input_shape)
    net = Unit(images,64)
    net = Unit(net,64)
    net = Unit(net,64)
    net = MaxPooling2D(pool_size=(2,2))(net)
    net = Unit(net,128)
    net = Unit(net,128)
    net = Unit(net,128)
    net = MaxPooling2D(pool_size=(2, 2))(net)
    net = Unit(net,256)
    net = Unit(net,256)
    net = Unit(net,256)
    net = Dropout(0.5)(net)
    net = AveragePooling2D(pool_size=(8,8))(net)
    net = Flatten()(net)
    net = Dense(units=10,activation="softmax")(net)
    model = Model(inputs=images,outputs=net)
    return model

在这里，我们使用功能性API来定义我们的模型，我们从三个单元格开始，每个单元格有64个过滤器，后面是一个Max Pooling layer，将32*32图像缩小到16*16。接下来是3128个过滤器单元，然后是Pooling,在这里，我们的图像变成8*8。最后，我们有另外3个单元，256通道。请注意，每当我们将图像尺寸缩小2倍时，我们将通道数增加一倍。

我们添加0.5的dropout，这将随机取消50%的参数，正如我之前解释的那样，它可以避免过度拟合。

接下来，我们需要加载cifar10数据集并执行一些数据进行增强：

#load the cifar10 dataset
(train_x, train_y) , (test_x, test_y) = cifar10.load_data()
#normalize the data
train_x = train_x.astype('float32') / 255
test_x = test_x.astype('float32') / 255
#Subtract the mean image from both train and test set
train_x = train_x - train_x.mean()
test_x = test_x - test_x.mean()
#Divide by the standard deviation
train_x = train_x / train_x.std(axis=0)
test_x = test_x / test_x.std(axis=0)

在上面的代码中，加载训练和测试数据后，我们从每幅图像中减去平均图像并除以标准偏差，这是一种基本的数据增加技术。对于更先进的数据增强，我们的图像加载过程会略有改变，Keras有一个非常有用的数据增强程序，可以简化整个过程。

下面的代码可以做到这一点：

 datagen = ImageDataGenerator(rotation_range=10,
                             width_shift_range=5. / 32,
                             height_shift_range=5. / 32,
                             horizontal_flip=True)
# Compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(train_x)

在上面，首先，我们指定了10度的旋转角度，高度和宽度都是5/32的偏移，最后是水平翻转，所有这些变换都会随机应用于训练集中的图像上。当然，还有更多转换存在，你可以查看可以为该类指定的所有参数。请记住，过度使用数据增强可能是有害的。

接下来，我们必须将标签转换为One热编码：

#Encode the labels to vectors
train_y = keras.utils.to_categorical(train_y,10)
test_y = keras.utils.to_categorical(test_y,10)

事实上，构成训练过程的几乎所有其他内容都与我之前的教程相同，这里是完整的代码：

#import needed classes
import keras
from keras.datasets import cifar10
from keras.layers import Dense,Conv2D,MaxPooling2D,Flatten,AveragePooling2D,Dropout,BatchNormalization,Activation
from keras.models import Model,Input
from keras.optimizers import Adam
from keras.callbacks import LearningRateScheduler
from keras.callbacks import ModelCheckpoint
from math import ceil
import os
from keras.preprocessing.image import ImageDataGenerator
def Unit(x,filters):
    out = BatchNormalization()(x)
    out = Activation("relu")(out)
    out = Conv2D(filters=filters, kernel_size=[3, 3], strides=[1, 1], padding="same")(out)
    return out
#Define the model
def MiniModel(input_shape):
    images = Input(input_shape)
    net = Unit(images,64)
    net = Unit(net,64)
    net = Unit(net,64)
    net = MaxPooling2D(pool_size=(2,2))(net)
    net = Unit(net,128)
    net = Unit(net,128)
    net = Unit(net,128)
    net = MaxPooling2D(pool_size=(2, 2))(net)
    net = Unit(net,256)
    net = Unit(net,256)
    net = Unit(net,256)
    net = Dropout(0.25)(net)
    net = AveragePooling2D(pool_size=(8,8))(net)
    net = Flatten()(net)
    net = Dense(units=10,activation="softmax")(net)
    model = Model(inputs=images,outputs=net)
    return model
#load the cifar10 dataset
(train_x, train_y) , (test_x, test_y) = cifar10.load_data()
#normalize the data
train_x = train_x.astype('float32') / 255
test_x = test_x.astype('float32') / 255
#Subtract the mean image from both train and test set
train_x = train_x - train_x.mean()
test_x = test_x - test_x.mean()
#Divide by the standard deviation
train_x = train_x / train_x.std(axis=0)
test_x = test_x / test_x.std(axis=0)
datagen = ImageDataGenerator(rotation_range=10,
                             width_shift_range=5. / 32,
                             height_shift_range=5. / 32,
                             horizontal_flip=True)
# Compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(train_x)
#Encode the labels to vectors
train_y = keras.utils.to_categorical(train_y,10)
test_y = keras.utils.to_categorical(test_y,10)
#define a common unit
input_shape = (32,32,3)
model = MiniModel(input_shape)
#Print a Summary of the model
model.summary()
#Specify the training components
model.compile(optimizer=Adam(0.001),loss="categorical_crossentropy",metrics=["accuracy"])
epochs = 20
steps_per_epoch = ceil(50000/128)
# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(train_x, train_y, batch_size=128),
                    validation_data=[test_x,test_y],                epochs=epochs,steps_per_epoch=steps_per_epoch, verbose=1, workers=4)
#Evaluate the accuracy of the test dataset
accuracy = model.evaluate(x=test_x,y=test_y,batch_size=128)
model.save("cifar10model.h5")

首先，这里有一些不同之处

input_shape = (32,32,3)
model = MiniModel(input_shape)
#Print a Summary of the model
model.summary()

正如前面说的，cifar 10数据集由32*32的RGB图像组成，由此输入形状有3个通道。

下一行创建我们定义的模型的实例，并传入输入形状，最后一行将打印出我们网络的完整摘要，包括参数的数量。

最后需要解释的是：

epochs = 20
steps_per_epoch = ceil(50000/128)
# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(train_x, train_y, batch_size=128),
                    validation_data=[test_x,test_y],                  epochs=epochs,steps_per_epoch=steps_per_epoch, verbose=1, workers=4)
#Evaluate the accuracy of the test dataset
accuracy = model.evaluate(x=test_x,y=test_y,batch_size=128)
model.save("cifar10model.h5")

首先我们要定义运行的时期的数量和每个时期的步骤数量。

steps_per_epoch = ceil(50000/128)

50000是总共训练图像的数量，我们使用一个批处理大小为128。

接下来是拟合函数，这与我在之前的教程中解释的拟合函数明显不同。

下面的第二个看看会有所帮助：

Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(train_x, train_y, batch_size=128),
                    validation_data=[test_x,test_y],
                    epochs=epochs,steps_per_epoch=steps_per_epoch, verbose=1, workers=4)

本教程主要是向你介绍基本组件，你也可以自己尝试调整参数和网络，以了解你可以提高的准确度。

数十款阿里云产品限时折扣中，赶紧点击领券开始云上实践吧！

本文由北邮@爱可可-爱生活老师推荐，阿里云云栖社区组织翻译。

文章原标题《Components of convoutional neural networks》

作者：John Olafenwa

译者：乌拉乌拉，审校：袁虎。

文章为简译，更为详细的内容，请查看原文文章。

码农公寓

相关文章