CS231n_2020课程任务实现1.3——Softmax

传送门:CS231n_2020课程任务实现——系列汇总

CS231n_2020 Assignment 1

CS231n是Stanford计算机专业的著名课程,关注卷积神经网络在计算机视觉识别中的应用,每年都会更新slides和assignments。本系列是作者对CS231n课程2020年的3个assignments的学习与实现过程的记录,仅供参考,欢迎讨论、批评与指正。
CS231n_2020官网

Assignment 1的主题是Image Classification,Assignment 1页面。本文是针对Assignment 1.3的学习与实现过程。

本系列将按照课程任务的分项依次实现,代码Github链接

准备工作

依据课程推荐,作者的代码运行都在Google Colaboratory上完成。该平台的基本用法可自行百度(与Jupyter Notebook类似)。
先阅读Assignment 1页面的任务要求,下载用于Colab实现的代码包后上传到Google Drive中。

Assignment 1共包含5个子任务,在代码包中包含了它们各自对应的.ipynb文件,可以方便地开展代码修改、调试、结果输出等工作。

k-Nearest Neighbor (kNN)

CS231n_2020课程任务实现1.1——k-Nearest Neighbor

Support Vector Machine (SVM)

CS231n_2020课程任务实现1.2——Support Vector Machine

Softmax

知识基础

课程提供的一个线性分类器在线demo,从中可以理解Softmax的含义。
值得关注的概念:

  • 交叉熵损失函数(cross-entropy loss):
    L i = − f y i + log ⁡ ∑ j e f j L_i= -f_{y_i} + \log\sum_j e^{f_j} Li​=−fyi​​+logj∑​efj​
    其中 f ( x i ; W ) = W x i f(x_i; W) = W x_i f(xi​;W)=Wxi​与SVM相同。

更详尽的讲解请参考官方课程笔记-线性分类器官方课程笔记-优化

完成 softmax.py

softmax_loss_naive函数

用循环的方式计算softmax损失函数及其梯度。
提示:要避免numeric instability。

def softmax_loss_naive(W, X, y, reg):
    loss = 0.0
    dW = np.zeros_like(W)

    num_classes = W.shape[1]
    num_train = X.shape[0]
    for i in range(num_train):
        scores = X[i].dot(W)
        scores -= np.max(scores)
        loss += np.log(np.sum(np.exp(scores))) - scores[y[i]]
        dW[:, y[i]] -= X[i]
        for j in range(num_classes):
            dW[:, j] += X[i] * np.exp(scores[j]) / np.sum(np.exp(scores))

    loss /= num_train
    dW /= num_train
    loss += reg * np.sum(W * W)
    dW += 2 * reg * W
    return loss, dW

与SVM的损失函数计算过程大部分相似。

softmax_loss_vectorized函数

用矢量化编程重写softmax损失函数及其梯度。
提示:要避免numeric instability。

def softmax_loss_vectorized(W, X, y, reg):
    loss = 0.0
    dW = np.zeros_like(W)

    num_train = X.shape[0]
    scores = X.dot(W)
    scores -= np.max(scores, axis = 1).reshape(-1, 1)
    loss = np.sum(np.log(np.sum(np.exp(scores), axis = 1))) - np.sum(scores[np.arange(num_train), y])
    loss = loss / num_train + reg * np.sum(W * W)

    scores_ = np.exp(scores) / np.sum(np.exp(scores), axis = 1).reshape(-1, 1)
    scores_[np.arange(num_train), y] -= 1
    dW = X.T.dot(scores_) / num_train + 2 * reg * W

也与SVM的损失函数计算过程大部分相似。

完成 Validation

对不同的学习率和λ \lambdaλ参数分别进行训练、验证,确定效果较好的超参数。

from cs231n.classifiers import Softmax
results = {}
best_val = -1
best_softmax = None

learning_rates = np.linspace(0.8e-7, 1.2e-7, 5)
regularization_strengths = np.linspace(2e4, 3e4, 3)

for lr in learning_rates:
    for rs in regularization_strengths:
        softmax = Softmax()
        loss_hist = softmax.train(X_train, y_train, learning_rate=lr, reg=rs,
                      num_iters=1500, verbose=False)
        y_train_pred = softmax.predict(X_train)
        train_acc = np.mean(y_train == y_train_pred)
        y_val_pred = softmax.predict(X_val)
        val_acc = np.mean(y_val == y_val_pred)

        results[(lr, rs)] = (train_acc, val_acc)
        if (val_acc > best_val):
            best_val = val_acc
            best_softmax = softmax
            
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
    
print('best validation accuracy achieved during cross-validation: %f' % best_val)

与SVM的Validation过程大部分相似。

Inline Questions

Inline Question 1

Why do we expect our loss to be close to -log(0.1)? Explain briefly.

答:softmax的损失函数应接近于 − log ⁡ ( 0.1 ) -\log(0.1) −log(0.1),是因为 L i = − log ⁡ ( e f y i ∑ j e f j ) L_i = -\log\left(\frac{e^{f_{y_i}}}{ \sum_j e^{f_j} }\right) Li​=−log(∑j​efj​efyi​​​)中每一类的 f f f差异不大,而当所有 f f f均相等时恰好有 L i = − log ⁡ ( 0.1 ) L_i=-\log(0.1) Li​=−log(0.1)。

Inline Question 2

True or False
Suppose the overall training loss is defined as the sum of the per-datapoint loss over all training examples. It is possible to add a new datapoint to a training set that would leave the SVM loss unchanged, but this is not the case with the Softmax classifier loss.

答案:正确。
解释:SVM总训练损失在加入一个新数据点后可能不变(hinge loss可能为0),但Softmax总训练损失在加入一个新数据点后一定变化(cross-entropy loss一定大于0)。

Results

不同损失函数实现方法的运行时间对比

naive loss: 2.383276e+00 computed in 0.224043s
vectorized loss: 2.383276e+00 computed in 0.015238s
Loss difference: 0.000000
Gradient difference: 0.000000

矢量化实现的运行速度远高于用循环实现的速度(超出约1个数量级),体现了矢量化编程的高效。

学习率和 λ \lambda λ的 Validation

初始参数:

learning_rates = [1e-7, 5e-7]
regularization_strengths = [2.5e4, 5e4]
num_iters=1500

结果:

lr 1.000000e-07 reg 2.500000e+04 train accuracy: 0.333837 val accuracy: 0.348000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.306347 val accuracy: 0.323000
lr 5.000000e-07 reg 2.500000e+04 train accuracy: 0.315959 val accuracy: 0.331000
lr 5.000000e-07 reg 5.000000e+04 train accuracy: 0.296837 val accuracy: 0.316000
best validation accuracy achieved during cross-validation: 0.348000

对训练超参数进行精度逐渐提升的尝试:
第一轮参数:

learning_rates = np.linspace(0.8e-7, 1.2e-7, 5)
regularization_strengths = np.linspace(2e4, 3e4, 3)
num_iters=1500

结果:

lr 8.000000e-08 reg 2.000000e+04 train accuracy: 0.333980 val accuracy: 0.344000
lr 8.000000e-08 reg 2.500000e+04 train accuracy: 0.325898 val accuracy: 0.346000
lr 8.000000e-08 reg 3.000000e+04 train accuracy: 0.317714 val accuracy: 0.326000
lr 9.000000e-08 reg 2.000000e+04 train accuracy: 0.335449 val accuracy: 0.351000
lr 9.000000e-08 reg 2.500000e+04 train accuracy: 0.327327 val accuracy: 0.347000
lr 9.000000e-08 reg 3.000000e+04 train accuracy: 0.319755 val accuracy: 0.336000
lr 1.000000e-07 reg 2.000000e+04 train accuracy: 0.335286 val accuracy: 0.338000
lr 1.000000e-07 reg 2.500000e+04 train accuracy: 0.331980 val accuracy: 0.351000
lr 1.000000e-07 reg 3.000000e+04 train accuracy: 0.319490 val accuracy: 0.337000
lr 1.100000e-07 reg 2.000000e+04 train accuracy: 0.339408 val accuracy: 0.351000
lr 1.100000e-07 reg 2.500000e+04 train accuracy: 0.329388 val accuracy: 0.351000
lr 1.100000e-07 reg 3.000000e+04 train accuracy: 0.318633 val accuracy: 0.330000
lr 1.200000e-07 reg 2.000000e+04 train accuracy: 0.340551 val accuracy: 0.358000
lr 1.200000e-07 reg 2.500000e+04 train accuracy: 0.324265 val accuracy: 0.346000
lr 1.200000e-07 reg 3.000000e+04 train accuracy: 0.318918 val accuracy: 0.328000
best validation accuracy achieved during cross-validation: 0.358000

以上超参数试验满足题目的验证集准确率高于35%的要求。

总准确率

softmax on raw pixels final test set accuracy: 0.352000

用验证集上效果最好的Softmax模型在test集上进行测试,得到准确率为35.2%,略低于SVM模型。

权重可视化

CS231n_2020课程任务实现1.3——Softmax

Two-Layer Neural Network

Coming soon

Image Features

Coming soon

上一篇:cs231n-第一稿笔记--待后续整理


下一篇:cs231n__4.2 神经网络 Neural networks