参考:
https://blog.csdn.net/qq_37189298/article/details/110945128
========================================
代码:
import torch from torch import cuda import time x = torch.zeros([1,1024,1024,128*2], requires_grad=True, device='cuda:0') print("1", cuda.memory_allocated()/1024**2) y = 5 * x # y.retain_grad() print("2", cuda.memory_allocated()/1024**2) torch.mean(y).backward() print("3", cuda.memory_allocated()/1024**2) print(cuda.memory_summary()) time.sleep(60)
可以看到pytorch占显存共4777MB空间,其中变量及缓存共占4096空间。可以知道其中1024MB空间为缓存,可以手动释放,改代码:
import torch from torch import cuda import time x = torch.zeros([1,1024,1024,128*2], requires_grad=True, device='cuda:0') print("1", cuda.memory_allocated()/1024**2) y = 5 * x # y.retain_grad() print("2", cuda.memory_allocated()/1024**2) torch.mean(y).backward() print("3", cuda.memory_allocated()/1024**2) torch.cuda.empty_cache() print(cuda.memory_summary()) time.sleep(60)
根据参考文章可知,1024*3MB是变量内存,其余700MB为其他内存,其中变量内存中有1024为x.grad,而且程序运行过程中显存分配峰值为4096MB,如下图:
其中包括 x.grad 和 y.grad 各1024MB空间。
如果保存非叶子节点的grad值,即保存y.grad,运行:
import torch from torch import cuda import time x = torch.zeros([1,1024,1024,128*2], requires_grad=True, device='cuda:0') print("1", cuda.memory_allocated()/1024**2) y = 5 * x y.retain_grad() print("2", cuda.memory_allocated()/1024**2) torch.mean(y).backward() print("3", cuda.memory_allocated()/1024**2) torch.cuda.empty_cache() print(cuda.memory_summary()) time.sleep(60)
发现显存不够用了,也就是说保存y.grad后整体显存已经快达到5.9GB了,于是相同代码再Titan上运行:
发现总显存:
运行结果:
================================================