防止过拟合之提前终止(Early Stopping)
Early Stopping
Brief Introduction
当我们训练深度学习神经网络的时候通常希望能获得最好的泛化性能。但是所有的标准深度学习神经网络结构如MLP都很容易过拟合:当网络在训练集上的错误率越来越低的时候,实际上在某一刻,它在测试集的表现已经开始变差。
PS:这张图中由于验证集的损失存在细小的扰动,验证集的“U型”表现得不是很明显。
How to slove overfitting
1.降低参数空间的维度。
2.降低每个维度上的有效规模。
降低参数数量的方法包括greedy constructive learning、剪枝和权重共享等。降低每个参数维度的有效规模的方法主要是正则化,如权重衰变(weight decay)和早停法(early stopping)等。
Early stopping
Brief Introduction
在训练中计算模型在验证集上的表现,当模型在验证集上的表现开始下降的时候,停止训练。
Specific steps
step1: 将训练集分为训练集和验证集
step2: 将只在训练集上进行训练,并每个一个周期T计算模型在验证集上的误差,例如,每15次epoch(mini batch)训练中的一个周期,保存当前情况下的最优模型参数。
step3: 当观察到较坏的验证集表现P次时停止训练(P可以理解为耐心值,容忍度)。
step4: 使用上一次迭代结果中的参数作为模型的最终参数。
Codes
下面通过一个示例来使用early stopping,使用简单的3层GCN作为例子。
Pytorch = 1.7.1 , Python = 3.6 ,torch-geomatric = 1.7.1, CUDA = 10.1
import random
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch_geometric.datasets import Planetoid
import matplotlib.pyplot as plt
# 定义使用的网络,3层GCN
class GCN_NET3(torch.nn.Module):
'''
three-layers GCN
two-layers GCN has a better performance
'''
def __init__(self, num_features, hidden_size1, hidden_size2, classes):
'''
:param num_features: each node has a [1,D] feature vector
:param hidden_size1: the size of the first hidden layer
:param hidden_size2: the size of the second hidden layer
:param classes: the number of the classes
'''
super(GCN_NET3, self).__init__()
self.conv1 = GCNConv(num_features, hidden_size1)
self.relu = torch.nn.ReLU()
self.dropout = torch.nn.Dropout(p=0.5) # use dropout to over ove-fitting
self.conv2 = GCNConv(hidden_size1, hidden_size2)
self.conv3 = GCNConv(hidden_size2, classes)
self.softmax = torch.nn.Softmax(dim=1) # each raw
def forward(self, Graph):
x, edge_index = Graph.x, Graph.edge_index
out = self.conv1(x, edge_index)
out = self.relu(out)
out = self.dropout(out)
out = self.conv2(out, edge_index)
out = self.relu(out)
out = self.dropout(out)
out = self.conv3(out, edge_index)
out = self.softmax(out)
return out
def setup_seed(seed):
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
random.seed(seed)
dataset = Planetoid(root='./', name='Cora') # if root='./', Planetoid will use local dataset
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # use cpu or gpu
model = GCN_NET3(dataset.num_node_features, 128, 64, dataset.num_classes).to(device)
data = dataset[0].to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.005) # define optimizer
# define some parameters
eval_T = 5 # evaluate period
P = 3 # patience
i = 0 # record the frequency f bad performance of validation
max_epoch = 300
setup_seed(seed=20) # set up random seed
temp_val_loss = 99999 # initialize val loss
L = [] # store loss of training
L_val = [] # store loss of val
# training process
model.train()
for epoch in range(max_epoch):
optimizer.zero_grad()
out = model(data)
loss = F.cross_entropy(out[data.train_mask], data.y[data.train_mask])
_, val_pred = model(data).max(dim=1)
loss_val = F.cross_entropy(out[data.val_mask], data.y[data.val_mask])
# early stopping
if (epoch % eval_T) == 0:
if (temp_val_loss > loss_val):
temp_val_loss = loss_val
torch.save(model.state_dict(), "GCN_NET3.pth") # save th current best
i = 0 # reset i
else:
i = i + 1
if i > P:
print("Early Stopping! Epoch : ", epoch,)
break
L_val.append(loss_val)
val_corrent = val_pred[data.val_mask].eq(data.y[data.val_mask]).sum().item()
val_acc = val_corrent / data.val_mask.sum()
print('Epoch: {} loss : {:.4f} val_loss: {:.4f} val_acc: {:.4f}'.format(epoch, loss.item(),
loss_val.item(), val_acc.item()))
L.append(loss.item())
loss.backward()
optimizer.step()
# test
model.load_state_dict(torch.load("GCN_NET3.pth")) # load parameters of the model
model.eval()
_, pred = model(data).max(dim=1)
corrent = pred[data.test_mask].eq(data.y[data.test_mask]).sum().item()
acc = corrent / data.test_mask.sum()
print("test accuracy is {:.4f}".format(acc.item()))
# plot the curve of loss
n = [i for i in range(len(L))]
plt.plot(n, L, label='train')
plt.plot(n, L_val, label='val')
plt.legend() # show the labels
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.show()
Result
输出结果:
Early Stopping! Epoch : 28
test accuracy is 0.8030
测试的准确率有了比较显著的提高,未使用Early stopping时测试准确率为76%左右;现在为78%左右,最高可以达到80.3%
Disadvantages
1.如果过早结束训练,导致代价函数较大需要考虑的情况比较复杂。同时有可能得到的是一个局部最优解或者是次优解(suboptimal solution)。
2.在训练时会增加计算代价,在实际操作中可以通过将early stopping 并行在与主训练过程独立的CPU或者是GPU上完成,也可以较不频繁的评估验证集来减小评估代价。
3.需要保存最佳的训练副本,当需要一定的存储空间。
Reference
1.深度学习- Ian Goodfellow
2.使用Pytorch Geometric实现GCN、GraphSAGE和GAT