关于docker和主机之间文件的转换,参考docker的那个博客+ https://zhuanlan.zhihu.com/p/55516749
直接用docker安装,对facebookresearch/maskrcnn-benchmark的docker文件进行修改,注意几点CUDA改为10,apex留意一下dockerfile里面的pip uninstall apex; git clone https://github.com/NVIDIA/apex.git; cd apex; python setup.py install --cuda_ext --cpp_ext
安装的时候参考一下: https://ihaoming.top/archives/623a7632.html gcc版本<5.4
模仿archdyn的dockerfile修改。。
The only way to train and prevent the Runtime Error is to modify the Dockerfile and build it like:
ARG CUDA="9.0" ARG CUDNN="7" FROM nvidia/cuda:${CUDA}-cudnn${CUDNN}-devel-ubuntu16.04 RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections # install basics RUN apt-get update -y \ && apt-get install -y apt-utils git curl ca-certificates bzip2 cmake tree htop bmon iotop g++ # Install Miniconda RUN curl -so /miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh \ && chmod +x /miniconda.sh \ && /miniconda.sh -b -p /miniconda \ && rm /miniconda.sh ENV PATH=/miniconda/bin:$PATH # Create a Python 3.6 environment RUN /miniconda/bin/conda install -y conda-build \ && /miniconda/bin/conda create -y --name py36 python=3.6.7 \ && /miniconda/bin/conda clean -ya ENV CONDA_DEFAULT_ENV=py36 ENV CONDA_PREFIX=/miniconda/envs/$CONDA_DEFAULT_ENV ENV PATH=$CONDA_PREFIX/bin:$PATH ENV CONDA_AUTO_UPDATE_CONDA=false RUN conda install -y ipython RUN pip install ninja yacs cython matplotlib # Install PyTorch 1.0 Nightly RUN conda install -y pytorch-nightly -c pytorch && conda clean -ya # Install TorchVision master RUN git clone https://github.com/pytorch/vision.git \ && cd vision \ && python setup.py install # install pycocotools RUN git clone https://github.com/cocodataset/cocoapi.git \ && cd cocoapi/PythonAPI \ && python setup.py build_ext install # install PyTorch Detection RUN git clone https://github.com/facebookresearch/maskrcnn-benchmark.git \ WORKDIR /maskrcnn-benchmark
nvidia-docker build -t maskrcnn-benchmark docker/
Then after the build I have to go inside the docker container:
nvidia-docker run --rm -it maskrcnn-benchmark bash
And inside the docker container I build maskrcnn-benchmark without problems:
python setup.py build develop
I then have to commit this modified docker container so that I have a Docker Image that can always be started:
docker commit [Container ID] maskrcnn-benchmark:working
After all these steps I can train without problems with:
nvidia-docker run --shm-size=8gb -v /home/archdyn/Datasets/coco:/maskrcnn-benchmark/datasets/coco maskrcnn-benchmark:working python /maskrcnn-benchmark/tools/train_net.py --config-file "/maskrcnn-benchmark/configs/e2e_mask_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 1 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1
具体复现的过程改参数参考:
https://zhuanlan.zhihu.com/p/57603975
https://zhuanlan.zhihu.com/p/67121644