【科研记录--- Issue of using `CUDA_VISIBLE_DEVICES'】

2023-12-15 10:26:04

使用命令：CUDA_VISIBLE_DEVICES=0,1,2 xxx

遇到问题：RuntimeError: CUDA out of memory. Tried to allocate 72.00 MiB (GPU 0; 23.70 GiB total capacity; 1.40 GiB already allocated; 10.69 MiB free; 1.42 GiB reserved in total by PyTorch)

解决方案：CUDA_VISIBLE_DEVICES=1,2 xxx

反思结果：

1- Server是按照0，1，2的顺序进行GPU调用，当GPU-0出现问题时候，机器不会顺延使用GPU1-2，而是报错

【科研记录--- Issue of using `CUDA_VISIBLE_DEVICES'】