远程服务器 Linux 用cityscape训练DeepLabv3模型(Pytorch版)

参考
https://blog.csdn.net/qq_45389690/article/details/111591713?utm_medium=distribute.pc_relevant_download.none-task-blog-baidujs-2.nonecase&depth_1-utm_source=distribute.pc_relevant_download.none-task-blog-baidujs-2.nonecase
https://blog.csdn.net/weixin_41919571/article/details/107906066
代码
https://github.com/jfzhang95/pytorch-deeplab-xception

出现问题
ImportError: No module named pycocotools.coco
解决
https://blog.csdn.net/u011961856/article/details/77676461

https://blog.csdn.net/joejeanjean/article/details/78839318?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-6&spm=1001.2101.3001.4242

https://blog.csdn.net/haiyonghao/article/details/80472713?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-11&spm=1001.2101.3001.4242
一定要把PythonAPI目录下除setup.py之外的所有文件拷贝到pytorch-deeplab-xception-master文件夹下

出现问题
ImportError: No module named ‘Queue’
解决
https://blog.csdn.net/DarrenXf/article/details/82962412

出现问题
from utils.loss import SegmentationLosses
ImportError: No module named loss
参考
https://blog.csdn.net/Diliduluw/article/details/103742766
解决
在utils文件下 新建一个空白的__init__.py

出现问题
AttributeError: ‘module’ object has no attribute ‘kaiming_normal_’
参考
https://blog.csdn.net/songchunxiao1991/article/details/83104893?utm_medium=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.control&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.control

https://blog.csdn.net/Aug0st/article/details/42707709
然后 我删除了所有pyc文件 (主要是不敢更新Python27\Lib\urllib2.pyc文件 怕影响别的算法)
删除指令 find /dir -name “*.pyc” | xargs rm -rf
但并没用

然后
我又配了一个python3.5 pytorch0.4.1的环境

出现问题
RuntimeError: CUDA error: out of memory
解决
train.py中改batch-size的default=2

出现问题
ValueError: Expected more than 1 value per channel when training, got input size [1, 256, 1, 1]

参考
https://blog.csdn.net/weixin_43925119/article/details/109755329
https://www.cnblogs.com/zmbreathing/p/pyTorch_BN_error.html
https://blog.csdn.net/sinat_39307513/article/details/87917537
https://blog.csdn.net/qq_42079689/article/details/102587401?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.control
https://blog.csdn.net/jining11/article/details/111478935?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-2&spm=1001.2101.3001.4242
https://blog.csdn.net/qq_36321330/article/details/108954588?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.control
https://blog.csdn.net/qq_34124009/article/details/109100053?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.control
https://blog.csdn.net/qq_21230831/article/details/103711545?utm_medium=distribute.pc_relevant.none-task-blog-OPENSEARCH-7.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-OPENSEARCH-7.control
https://blog.csdn.net/lrs1353281004/article/details/108262018?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-7&spm=1001.2101.3001.4242
模型中含有nn.BatchNorm层,训练时需要batch_size大于1,来计算当前batch的running mean and std,数据数量除以batch_size后刚好余1时就会报错。

解决
改batch_size 使其除完不余1,但我改完5之后内存又不行了,报错。
于是 我去找了/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py文件,改里面drop_last=True。
因为远程服务器不能直接打开这个文件,所以粘过来又拷过去的。
远程服务器 Linux 用cityscape训练DeepLabv3模型(Pytorch版)

上一篇:语义分割相关论文翻译(一):deeplabv3+


下一篇:Pytorch移植Deeplabv3+训练CityScapes数据集详细步骤