《软件工程》-卷积神经网络

一.MNIST 数据集分类

深度卷积神经网络中,有如下特性

另外值得注意的是,DataLoader是一个比较重要的类,提供的常用操作有:batch_size(每个batch的大小), shuffle(是否进行随机打乱顺序的操作), num_workers(加载数据的时候使用几个子进程)

  • 很多层: compositionality
  • 卷积: locality + stationarity of images
  • 池化: Invariance of object class to translations
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    import torch.optim as optim
    from torchvision import datasets, transforms
    import matplotlib.pyplot as plt
    import numpy
    
    # 一个函数,用来计算模型中有多少参数
    def get_n_params(model):
        np=0
        for p in list(model.parameters()):
            np += p.nelement()
        return np
    
    # 使用GPU训练,可以在菜单 "代码执行工具" -> "更改运行时类型" 里进行设置
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    1. 加载数据 (MNIST)

    PyTorch里包含了 MNIST, CIFAR10 等常用数据集,调用 torchvision.datasets 即可把这些数据由远程下载到本地,下面给出MNIST的使用方法:

    torchvision.datasets.MNIST(root, train=True, transform=None, target_transform=None, download=False)

  • root 为数据集下载到本地后的根目录,包括 training.pt 和 test.pt 文件
  • train,如果设置为True,从training.pt创建数据集,否则从test.pt创建。
  • download,如果设置为True, 从互联网下载数据并放到root文件夹下
  • transform, 一种函数或变换,输入PIL图片,返回变换之后的数据。
  • target_transform 一种函数或变换,输入目标,进行变换。
    input_size  = 28*28   # MNIST上的图像尺寸是 28x28
    output_size = 10      # 类别为 0 到 9 的数字,因此为十类
    
    train_loader = torch.utils.data.DataLoader(
        datasets.MNIST('./data', train=True, download=True,
            transform=transforms.Compose(
                [transforms.ToTensor(),
                 transforms.Normalize((0.1307,), (0.3081,))])),
        batch_size=64, shuffle=True)
    
    test_loader = torch.utils.data.DataLoader(
        datasets.MNIST('./data', train=False, transform=transforms.Compose([
                 transforms.ToTensor(),
                 transforms.Normalize((0.1307,), (0.3081,))])),
        batch_size=1000, shuffle=True)

    运行结果为:

    Downloading
    http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
    Downloading
    http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to
    ./data/MNIST/raw/train-images-idx3-ubyte.gz Extracting
    ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

    Downloading
    http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
    Downloading
    http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to
    ./data/MNIST/raw/train-labels-idx1-ubyte.gz Extracting
    ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

    Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
    Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
    to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz Extracting
    ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

    Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
    Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
    to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz Extracting
    ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw

    /usr/local/lib/python3.7/dist-packages/torchvision/datasets/mnist.py:498:
    UserWarning: The given NumPy array is not writeable, and PyTorch does
    not support non-writeable tensors. This means you can write to the
    underlying (supposedly non-writeable) NumPy array using the tensor.
    You may want to copy the array to protect its data or make it
    writeable before converting it to a tensor. This type of warning will
    be suppressed for the rest of this program. (Triggered internally at
    /pytorch/torch/csrc/utils/tensor_numpy.cpp:180.) return
    torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)

      显示数据集中的部分图像

plt.figure(figsize=(8, 5))
for i in range(20):
    plt.subplot(4, 5, i + 1)
    image, _ = train_loader.dataset.__getitem__(i)
    plt.imshow(image.squeeze().numpy(),'gray')
    plt.axis('off');

运行结果为:

《软件工程》-卷积神经网络

 

2. 创建网络

定义网络时,需要继承nn.Module,并实现它的forward方法,把网络中具有可学习参数的层放在构造函数init中。

只要在nn.Module的子类中定义了forward函数,backward函数就会自动被实现(利用autograd)。

class FC2Layer(nn.Module):
    def __init__(self, input_size, n_hidden, output_size):
        # nn.Module子类的函数必须在构造函数中执行父类的构造函数
        # 下式等价于nn.Module.__init__(self)        
        super(FC2Layer, self).__init__()
        self.input_size = input_size
        # 这里直接用 Sequential 就定义了网络,注意要和下面 CNN 的代码区分开
        self.network = nn.Sequential(
            nn.Linear(input_size, n_hidden), 
            nn.ReLU(), 
            nn.Linear(n_hidden, n_hidden), 
            nn.ReLU(), 
            nn.Linear(n_hidden, output_size), 
            nn.LogSoftmax(dim=1)
        )
    def forward(self, x):
        # view一般出现在model类的forward函数中,用于改变输入或输出的形状
        # x.view(-1, self.input_size) 的意思是多维的数据展成二维
        # 代码指定二维数据的列数为 input_size=784,行数 -1 表示我们不想算,电脑会自己计算对应的数字
        # 在 DataLoader 部分,我们可以看到 batch_size 是64,所以得到 x 的行数是64
        # 大家可以加一行代码:print(x.cpu().numpy().shape)
        # 训练过程中,就会看到 (64, 784) 的输出,和我们的预期是一致的

        # forward 函数的作用是,指定网络的运行过程,这个全连接网络可能看不啥意义,
        # 下面的CNN网络可以看出 forward 的作用。
        x = x.view(-1, self.input_size)
        return self.network(x)
    


class CNN(nn.Module):
    def __init__(self, input_size, n_feature, output_size):
        # 执行父类的构造函数,所有的网络都要这么写
        super(CNN, self).__init__()
        # 下面是网络里典型结构的一些定义,一般就是卷积和全连接
        # 池化、ReLU一类的不用在这里定义
        self.n_feature = n_feature
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=n_feature, kernel_size=5)
        self.conv2 = nn.Conv2d(n_feature, n_feature, kernel_size=5)
        self.fc1 = nn.Linear(n_feature*4*4, 50)
        self.fc2 = nn.Linear(50, 10)    
    
    # 下面的 forward 函数,定义了网络的结构,按照一定顺序,把上面构建的一些结构组织起来
    # 意思就是,conv1, conv2 等等的,可以多次重用
    def forward(self, x, verbose=False):
        x = self.conv1(x)
        x = F.relu(x)
        x = F.max_pool2d(x, kernel_size=2)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, kernel_size=2)
        x = x.view(-1, self.n_feature*4*4)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        x = F.log_softmax(x, dim=1)
        return x

定义训练和测试函数

# 训练函数
def train(model):
    model.train()
    # 主里从train_loader里,64个样本一个batch为单位提取样本进行训练
    for batch_idx, (data, target) in enumerate(train_loader):
        # 把数据送到GPU中
        data, target = data.to(device), target.to(device)

        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 100 == 0:
            print('Train: [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))


def test(model):
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        # 把数据送到GPU中
        data, target = data.to(device), target.to(device)
        # 把数据送入模型,得到预测结果
        output = model(data)
        # 计算本次batch的损失,并加到 test_loss 中
        test_loss += F.nll_loss(output, target, reduction='sum').item()
        # get the index of the max log-probability,最后一层输出10个数,
        # 值最大的那个即对应着分类结果,然后把分类结果保存在 pred 里
        pred = output.data.max(1, keepdim=True)[1]
        # 将 pred 与 target 相比,得到正确预测结果的数量,并加到 correct 中
        # 这里需要注意一下 view_as ,意思是把 target 变成维度和 pred 一样的意思                                                
        correct += pred.eq(target.data.view_as(pred)).cpu().sum().item()

    test_loss /= len(test_loader.dataset)
    accuracy = 100. * correct / len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        accuracy))

在小型全连接网络上训练(Fully-connected network)

n_hidden = 8 # number of hidden units

model_fnn = FC2Layer(input_size, n_hidden, output_size)
model_fnn.to(device)
optimizer = optim.SGD(model_fnn.parameters(), lr=0.01, momentum=0.5)
print('Number of parameters: {}'.format(get_n_params(model_fnn)))

train(model_fnn)
test(model_fnn)

运行结果为:

Number of parameters: 6442 Train: [0/60000 (0%)] Loss: 2.337591 Train:
[6400/60000 (11%)] Loss: 1.948347 Train: [12800/60000 (21%)] Loss:
1.346948 Train: [19200/60000 (32%)] Loss: 0.865751 Train: [25600/60000 (43%)] Loss: 0.688250 Train: [32000/60000 (53%)] Loss: 0.756100 Train:
[38400/60000 (64%)] Loss: 0.862340 Train: [44800/60000 (75%)] Loss:
0.509505 Train: [51200/60000 (85%)] Loss: 0.516737 Train: [57600/60000 (96%)] Loss: 0.541380
Test set: Average loss: 0.4560, Accuracy: 8693/10000 (87%)

 4. 在卷积神经网络上训练

# Training settings 
n_features = 6 # number of feature maps

model_cnn = CNN(input_size, n_features, output_size)
model_cnn.to(device)
optimizer = optim.SGD(model_cnn.parameters(), lr=0.01, momentum=0.5)
print('Number of parameters: {}'.format(get_n_params(model_cnn)))

train(model_cnn)
test(model_cnn)

运行结果为:

Number of parameters: 6422 Train: [0/60000 (0%)] Loss: 2.312314
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718:
UserWarning: Named tensors and all their associated APIs are an
experimental feature and subject to change. Please do not use them for
anything important until they are released as stable. (Triggered
internally at /pytorch/c10/core/TensorImpl.h:1156.) return
torch.max_pool2d(input, kernel_size, stride, padding, dilation,
ceil_mode) Train: [6400/60000 (11%)] Loss: 1.234897 Train:
[12800/60000 (21%)] Loss: 0.575131 Train: [19200/60000 (32%)] Loss:
0.386014 Train: [25600/60000 (43%)] Loss: 0.386902 Train: [32000/60000 (53%)] Loss: 0.276137 Train: [38400/60000 (64%)] Loss: 0.332431 Train:
[44800/60000 (75%)] Loss: 0.423124 Train: [51200/60000 (85%)] Loss:
0.083104 Train: [57600/60000 (96%)] Loss: 0.092805Test set: Average loss: 0.1637, Accuracy: 9526/10000 (95%)

通过上面的测试结果,可以发现,含有相同参数的 CNN 效果要明显优于 简单的全连接网络,是因为 CNN 能够更好的挖掘图像中的信息,主要通过两个手段:

  • 卷积:Locality and stationarity in images
  • 池化:Builds in some translation invariance

5. 打乱像素顺序再次在两个网络上训练与测试

考虑到CNN在卷积与池化上的优良特性,如果我们把图像中的像素打乱顺序,这样 卷积 和 池化 就难以发挥作用了,为了验证这个想法,我们把图像中的像素打乱顺序再试试。

首先下面代码展示随机打乱像素顺序后,图像的形态:


# 这里解释一下 torch.randperm 函数,给定参数n,返回一个从0到n-1的随机整数排列
perm = torch.randperm(784)
plt.figure(figsize=(8, 4))
for i in range(10):
    image, _ = train_loader.dataset.__getitem__(i)
    # permute pixels
    image_perm = image.view(-1, 28*28).clone()
    image_perm = image_perm[:, perm]
    image_perm = image_perm.view(-1, 1, 28, 28)
    plt.subplot(4, 5, i + 1)
    plt.imshow(image.squeeze().numpy(), 'gray')
    plt.axis('off')
    plt.subplot(4, 5, i + 11)
    plt.imshow(image_perm.squeeze().numpy(), 'gray')
    plt.axis('off')

《软件工程》-卷积神经网络

重新定义训练与测试函数,我们写了两个函数 train_perm 和 test_perm,分别对应着加入像素打乱顺序的训练函数与测试函数。

与之前的训练与测试函数基本上完全相同,只是对 data 加入了打乱顺序操作。

# 对每个 batch 里的数据,打乱像素顺序的函数
def perm_pixel(data, perm):
    # 转化为二维矩阵
    data_new = data.view(-1, 28*28)
    # 打乱像素顺序
    data_new = data_new[:, perm]
    # 恢复为原来4维的 tensor
    data_new = data_new.view(-1, 1, 28, 28)
    return data_new

# 训练函数
def train_perm(model, perm):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        # 像素打乱顺序
        data = perm_pixel(data, perm)

        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 100 == 0:
            print('Train: [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

# 测试函数
def test_perm(model, perm):
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data, target = data.to(device), target.to(device)

        # 像素打乱顺序
        data = perm_pixel(data, perm)

        output = model(data)
        test_loss += F.nll_loss(output, target, reduction='sum').item()
        pred = output.data.max(1, keepdim=True)[1]                                            
        correct += pred.eq(target.data.view_as(pred)).cpu().sum().item()

    test_loss /= len(test_loader.dataset)
    accuracy = 100. * correct / len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        accuracy))

 在全连接网络上训练与测试:

perm = torch.randperm(784)
n_hidden = 8 # number of hidden units

model_fnn = FC2Layer(input_size, n_hidden, output_size)
model_fnn.to(device)
optimizer = optim.SGD(model_fnn.parameters(), lr=0.01, momentum=0.5)
print('Number of parameters: {}'.format(get_n_params(model_fnn)))

train_perm(model_fnn, perm)
test_perm(model_fnn, perm)

 运行结果为:

Number of parameters: 6442 Train: [0/60000 (0%)] Loss: 2.319843 Train:
[6400/60000 (11%)] Loss: 1.820002 Train: [12800/60000 (21%)] Loss:
1.077188 Train: [19200/60000 (32%)] Loss: 0.675928 Train: [25600/60000 (43%)] Loss: 0.658187 Train: [32000/60000 (53%)] Loss: 0.682825 Train:
[38400/60000 (64%)] Loss: 0.629946 Train: [44800/60000 (75%)] Loss:
0.398080 Train: [51200/60000 (85%)] Loss: 0.268625 Train: [57600/60000 (96%)] Loss: 0.600681

Test set: Average loss: 0.4063, Accuracy: 8846/10000 (88%)

在卷积神经网络上训练与测试:

perm = torch.randperm(784)
n_features = 6 # number of feature maps

model_cnn = CNN(input_size, n_features, output_size)
model_cnn.to(device)
optimizer = optim.SGD(model_cnn.parameters(), lr=0.01, momentum=0.5)
print('Number of parameters: {}'.format(get_n_params(model_cnn)))

train_perm(model_cnn, perm)
test_perm(model_cnn, perm)

 运行结果为:

Number of parameters: 6422 Train: [0/60000 (0%)] Loss: 2.327404 Train:
[6400/60000 (11%)] Loss: 2.251524 Train: [12800/60000 (21%)] Loss:
2.113517 Train: [19200/60000 (32%)] Loss: 1.622411 Train: [25600/60000 (43%)] Loss: 1.146309 Train: [32000/60000 (53%)] Loss: 0.975707 Train:
[38400/60000 (64%)] Loss: 0.841636 Train: [44800/60000 (75%)] Loss:
0.623049 Train: [51200/60000 (85%)] Loss: 0.595479 Train: [57600/60000 (96%)] Loss: 0.610131

Test set: Average loss: 0.5359, Accuracy: 8294/10000 (83%)
 

二、CIFAR10数据分类 

对于视觉数据,PyTorch 创建了一个叫做 totchvision 的包,该包含有支持加载类似Imagenet,CIFAR10,MNIST 等公共数据集的数据加载模块 torchvision.datasets 和支持加载图像数据数据转换模块 torch.utils.data.DataLoader。

下面将使用CIFAR10数据集,它包含十个类别:‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’。CIFAR-10 中的图像尺寸为3x32x32,也就是RGB的3层颜色通道,每层通道内的尺寸为32*32。

《软件工程》-卷积神经网络

首先,加载并归一化 CIFAR10 使用 torchvision 。torchvision 数据集的输出是范围在[0,1]之间的 PILImage,我们将他们转换成归一化范围为[-1,1]之间的张量 Tensors。

大家肯定好奇,下面代码中说的是 0.5,怎么就变化到[-1,1]之间了?PyTorch源码中是这么写的:

input[channel] = (input[channel] - mean[channel]) / std[channel]

这样就是:((0,1)-0.5)/0.5=(-1,1)。

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# 使用GPU训练,可以在菜单 "代码执行工具" -> "更改运行时类型" 里进行设置
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# 注意下面代码中:训练的 shuffle 是 True,测试的 shuffle 是 false
# 训练时可以打乱顺序增加多样性,测试是没有必要
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=8,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

 下面展示 CIFAR10 里面的一些图片:

def imshow(img):
    plt.figure(figsize=(8,8))
    img = img / 2 + 0.5     # 转换到 [0,1] 之间
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

# 得到一组图像
images, labels = iter(trainloader).next()
# 展示图像
imshow(torchvision.utils.make_grid(images))
# 展示第一行图像的标签
for j in range(8):
    print(classes[labels[j]])

运行结果为:

《软件工程》-卷积神经网络

 

bird bird horse cat plane dog ship truck

接下来定义网络,损失函数和优化器:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# 网络放到GPU上
net = Net().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)

训练网络:

for epoch in range(10):  # 重复多轮训练
    for i, (inputs, labels) in enumerate(trainloader):
        inputs = inputs.to(device)
        labels = labels.to(device)
        # 优化器梯度归零
        optimizer.zero_grad()
        # 正向传播 + 反向传播 + 优化 
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        # 输出统计信息
        if i % 100 == 0:   
            print('Epoch: %d Minibatch: %5d loss: %.3f' %(epoch + 1, i + 1, loss.item()))

print('Finished Training')

运行结果为:

Epoch: 1 Minibatch: 1 loss: 2.307
Epoch: 1 Minibatch: 101 loss: 1.804
Epoch: 1 Minibatch: 201 loss: 1.876
Epoch: 1 Minibatch: 301 loss: 1.775
Epoch: 1 Minibatch: 401 loss: 1.470
Epoch: 1 Minibatch: 501 loss: 1.459
Epoch: 1 Minibatch: 601 loss: 1.551
Epoch: 1 Minibatch: 701 loss: 1.424
Epoch: 2 Minibatch: 1 loss: 1.269
Epoch: 2 Minibatch: 101 loss: 1.443
Epoch: 2 Minibatch: 201 loss: 1.369
Epoch: 2 Minibatch: 301 loss: 1.482
Epoch: 2 Minibatch: 401 loss: 1.345
Epoch: 2 Minibatch: 501 loss: 1.452
Epoch: 2 Minibatch: 601 loss: 1.517
Epoch: 2 Minibatch: 701 loss: 1.415
Epoch: 3 Minibatch: 1 loss: 1.137
Epoch: 3 Minibatch: 101 loss: 1.478
Epoch: 3 Minibatch: 201 loss: 1.223
Epoch: 3 Minibatch: 301 loss: 1.140
Epoch: 3 Minibatch: 401 loss: 1.206
Epoch: 3 Minibatch: 501 loss: 1.295
Epoch: 3 Minibatch: 601 loss: 1.085
Epoch: 3 Minibatch: 701 loss: 1.080
Epoch: 4 Minibatch: 1 loss: 1.052
Epoch: 4 Minibatch: 101 loss: 1.137
Epoch: 4 Minibatch: 201 loss: 1.170
Epoch: 4 Minibatch: 301 loss: 1.413
Epoch: 4 Minibatch: 401 loss: 1.017
Epoch: 4 Minibatch: 501 loss: 1.138
Epoch: 4 Minibatch: 601 loss: 1.326
Epoch: 4 Minibatch: 701 loss: 1.070
Epoch: 5 Minibatch: 1 loss: 0.977
Epoch: 5 Minibatch: 101 loss: 1.108
Epoch: 5 Minibatch: 201 loss: 1.245
Epoch: 5 Minibatch: 301 loss: 0.882
Epoch: 5 Minibatch: 401 loss: 1.041
Epoch: 5 Minibatch: 501 loss: 1.032
Epoch: 5 Minibatch: 601 loss: 0.952
Epoch: 5 Minibatch: 701 loss: 0.944
Epoch: 6 Minibatch: 1 loss: 0.953
Epoch: 6 Minibatch: 101 loss: 1.337
Epoch: 6 Minibatch: 201 loss: 1.218
Epoch: 6 Minibatch: 301 loss: 1.098
Epoch: 6 Minibatch: 401 loss: 1.133
Epoch: 6 Minibatch: 501 loss: 1.069
Epoch: 6 Minibatch: 601 loss: 1.080
Epoch: 6 Minibatch: 701 loss: 0.819
Epoch: 7 Minibatch: 1 loss: 1.136
Epoch: 7 Minibatch: 101 loss: 0.975
Epoch: 7 Minibatch: 201 loss: 0.960
Epoch: 7 Minibatch: 301 loss: 0.874
Epoch: 7 Minibatch: 401 loss: 1.104
Epoch: 7 Minibatch: 501 loss: 0.838
Epoch: 7 Minibatch: 601 loss: 1.322
Epoch: 7 Minibatch: 701 loss: 0.850
Epoch: 8 Minibatch: 1 loss: 1.035
Epoch: 8 Minibatch: 101 loss: 1.030
Epoch: 8 Minibatch: 201 loss: 1.284
Epoch: 8 Minibatch: 401 loss: 0.905
Epoch: 8 Minibatch: 501 loss: 1.103
Epoch: 8 Minibatch: 601 loss: 1.027
Epoch: 8 Minibatch: 701 loss: 0.934
Epoch: 9 Minibatch: 1 loss: 0.813
Epoch: 9 Minibatch: 101 loss: 0.858
Epoch: 9 Minibatch: 201 loss: 1.065
Epoch: 9 Minibatch: 301 loss: 0.759
Epoch: 9 Minibatch: 401 loss: 1.160
Epoch: 9 Minibatch: 501 loss: 1.243
Epoch: 9 Minibatch: 601 loss: 1.130
Epoch: 9 Minibatch: 701 loss: 0.973
Epoch: 10 Minibatch: 1 loss: 1.055
Epoch: 10 Minibatch: 101 loss: 0.929
Epoch: 10 Minibatch: 201 loss: 0.896
Epoch: 10 Minibatch: 301 loss: 0.816
Epoch: 10 Minibatch: 401 loss: 0.805
Epoch: 10 Minibatch: 501 loss: 0.888
Epoch: 10 Minibatch: 601 loss: 0.852
Epoch: 10 Minibatch: 701 loss: 0.687
Finished Training

现在我们从测试集中取出8张图片:

# 得到一组图像
images, labels = iter(testloader).next()
# 展示图像
imshow(torchvision.utils.make_grid(images))
# 展示图像的标签
for j in range(8):
    print(classes[labels[j]])

 运行结果为:

《软件工程》-卷积神经网络

cat
ship
ship
plane
frog
frog
car
frog 

我们把图片输入模型,看看CNN把这些图片识别成什么:

outputs = net(images.to(device))
_, predicted = torch.max(outputs, 1)

# 展示预测的结果
for j in range(8):
    print(classes[predicted[j]])

识别结果为:

cat
ship
ship
plane
deer
frog
car
bird

 可以看到,有几个都识别错了~~~ 让我们看看网络在整个数据集上的表现:

correct = 0
total = 0

for data in testloader:
    images, labels = data
    images, labels = images.to(device), labels.to(device)
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

Accuracy of the network on the 10000 test images: 63 %

三、使用 VGG16 对 CIFAR10 分类 

使用 VGG16 对 CIFAR10 分类VGG是由Simonyan 和Zisserman在文献《Very Deep Convolutional Networks for Large Scale Image Recognition》中提出卷积神经网络模型,其名称来源于作者所在的牛津大学视觉几何组(Visual Geometry Group)的缩写。该模型参加2014年的 ImageNet图像分类与定位挑战赛,取得了优异成绩:在分类任务上排名第二,在定位任务上排名第一。VGG16的网络结构如下图所示:

《软件工程》-卷积神经网络

16层网络的结节信息如下:

01:Convolution using 64 filters
02: Convolution using 64 filters + Max pooling
03: Convolution using 128 filters
04: Convolution using 128 filters + Max pooling
05: Convolution using 256 filters
06: Convolution using 256 filters
07: Convolution using 256 filters + Max pooling
08: Convolution using 512 filters
09: Convolution using 512 filters
10: Convolution using 512 filters + Max pooling
11: Convolution using 512 filters
12: Convolution using 512 filters
13: Convolution using 512 filters + Max pooling
14: Fully connected with 4096 nodes
15: Fully connected with 4096 nodes
16: Softmax

1.定义 dataloader

需要注意的是,这里的 transform,dataloader 和之前定义的有所不同

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# 使用GPU训练,可以在菜单 "代码执行工具" -> "更改运行时类型" 里进行设置
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,  download=True, transform=transform_train)
testset  = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=128, shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

2. VGG 网络定义

 

现在的结构基本上是:

64 conv, maxpooling,

128 conv, maxpooling,

256 conv, 256 conv, maxpooling,

512 conv, 512 conv, maxpooling,

512 conv, 512 conv, maxpooling,

softmax

模型的实现代码为:

class VGG(nn.Module):
    def __init__(self):
        super(VGG, self).__init__()
        self.cfg = [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M']
        self.features = self._make_layers(cfg)
        self.classifier = nn.Linear(2048, 10)

    def forward(self, x):
        out = self.features(x)
        out = out.view(out.size(0), -1)
        out = self.classifier(out)
        return out

    def _make_layers(self, cfg):
        layers = []
        in_channels = 3
        for x in cfg:
            if x == 'M':
                layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
            else:
                layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),
                           nn.BatchNorm2d(x),
                           nn.ReLU(inplace=True)]
                in_channels = x
        layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
        return nn.Sequential(*layers)

初始化网络,根据实际需要,修改分类层。因为 tiny-imagenet 是对200类图像分类,这里把输出修改为200。

# 网络放到GPU上
net = VGG().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)

3.网络训练

训练的代码和以前是完全一样的:

for epoch in range(10):  # 重复多轮训练
    for i, (inputs, labels) in enumerate(trainloader):
        inputs = inputs.to(device)
        labels = labels.to(device)
        # 优化器梯度归零
        optimizer.zero_grad()
        # 正向传播 + 反向传播 + 优化 
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        # 输出统计信息
        if i % 100 == 0:   
            print('Epoch: %d Minibatch: %5d loss: %.3f' %(epoch + 1, i + 1, loss.item()))

print('Finished Training')

运行结果为:

Epoch: 1 Minibatch: 1 loss: 3.392

Epoch: 1 Minibatch: 101 loss:1.496

Epoch: 1 Minibatch: 201 loss: 1.341

Epoch: 1 Minibatch: 301 loss: 1.314

Epoch: 2 Minibatch: 1 loss: 1.181

Epoch: 2 Minibatch:101 loss: 1.242

Epoch: 2 Minibatch: 201 loss: 1.123

Epoch: 2Minibatch: 301 loss: 1.196

Epoch: 3 Minibatch: 1 loss: 1.350
Epoch: 3 Minibatch: 101 loss: 1.237

Epoch: 3 Minibatch: 201 loss:1.235

Epoch: 3 Minibatch: 301 loss: 1.099

Epoch: 4 Minibatch: 1 loss: 1.140

Epoch: 4 Minibatch: 101 loss: 0.997

Epoch: 4 Minibatch:201 loss: 1.259

Epoch: 4 Minibatch: 301 loss: 1.258

Epoch: 5Minibatch: 1 loss: 1.189

Epoch: 5 Minibatch: 101 loss: 1.042
Epoch: 5 Minibatch: 201 loss: 1.137

Epoch: 5 Minibatch: 301 loss:1.142

Epoch: 6 Minibatch: 1 loss: 1.025

Epoch: 6 Minibatch: 101 loss: 1.031

Epoch: 6 Minibatch: 201 loss: 1.199

Epoch: 6 Minibatch:301 loss: 1.168

Epoch: 7 Minibatch: 1 loss: 0.958

Epoch: 7Minibatch: 101 loss: 1.106

Epoch: 7 Minibatch: 201 loss: 1.045
Epoch: 7 Minibatch: 301 loss: 1.169

Epoch: 8 Minibatch: 1 loss:1.158

Epoch: 8 Minibatch: 101 loss: 1.066

Epoch: 8 Minibatch: 201 loss: 0.984

Epoch: 8 Minibatch: 301 loss: 1.113

Epoch: 9 Minibatch:1 loss: 1.247

Epoch: 9 Minibatch: 101 loss: 1.102

Epoch: 9Minibatch: 201 loss: 1.209

Epoch: 9 Minibatch: 301 loss: 1.235
Epoch: 10 Minibatch: 1 loss: 0.998

Epoch: 10 Minibatch: 101loss: 1.159

Epoch: 10 Minibatch: 201 loss: 1.079

Epoch: 10Minibatch: 301 loss: 1.106 Finished Training

4. 测试验证准确率: 

测试的代码和之前也是完全一样的。


correct = 0
total = 0

for data in testloader:
    images, labels = data
    images, labels = images.to(device), labels.to(device)
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %.2f %%' % (
    100 * correct / total))

运行结果为:

Accuracy of the network on the 10000 test images: 84.92 %

可以看到,使用一个简化版的 VGG 网络,就能够显著地将准确率由 64%,提升到 84.92% 

上一篇:Directed Roads CodeForces - 711D (基环外向树 )


下一篇:HDU 3018 Ant Trip (一笔画问题)