设置好linux环境变量
首先实现多个版本cuda的切换
例如我安装的了两个版本的cuda,分别为cuda9.0和cuda10.2。
两个cuda的目录分别为
#cuda9
/data/home/cuiaihao/cuda9
#cuda10.2
/data/home/cuiaihao/cuda10
当我在使用cuda9时,我的环境变量中为
export PATH=/data/home/cuiaihao/cuda9/bin:$PATH
export LD_LIBRARY_PATH=/data/home/cuiaihao/cuda9/lib64:$LD_LIBRARY_PATH
#export PATH=/data/home/cuiaihao/cuda10.2/bin${PATH:+:${PATH}}
#export LD_LIBRARY_PATH=/data/home/cuiaihao/cuda10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
当我在使用cuda10.2时,我的环境变量为
#export PATH=/data/home/cuiaihao/cuda9/bin:$PATH
#export LD_LIBRARY_PATH=/data/home/cuiaihao/cuda9/lib64:$LD_LIBRARY_PATH
export PATH=/data/home/cuiaihao/cuda10.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/data/home/cuiaihao/cuda10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
环境变量可以通过vim进行编辑,
vim ~/.bashrc
编辑结束后运行,使环境变量生效
source ~/.bashrc
修改成功后可以通过nvcc --version来查看当前使用的cuda版本
nvcc --version
比如我的设置好cuda10.2后,nvcc --version输出为
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
下载apex
标准安装
为了性能和完整的功能,建议通过CUDA和c++扩展来安装Apex
$ git clone https://github.com/NVIDIA/apex
$ cd apex
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Apex 同样支持 Python-only build (required with Pytorch 0.4) via
$ pip install -v --no-cache-dir ./
我遇到的问题
如果你在安装apex时报错,比如出现
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 10.2.
after this
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
i get this error
torch.version = 1.6.0
/tmp/pip-req-build-l3l15eo8/setup.py:67: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")
Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
from /data/home/cuiaihao/cuda9/bin
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-req-build-l3l15eo8/setup.py", line 171, in <module>
check_cuda_torch_binary_vs_bare_metal(torch.utils.cpp_extension.CUDA_HOME)
File "/tmp/pip-req-build-l3l15eo8/setup.py", line 106, in check_cuda_torch_binary_vs_bare_metal
"https://github.com/NVIDIA/apex/pull/323#discussion_r287021798. "
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 10.2.
In some cases, a minor-version mismatch will not cause later errors: https://github.com/NVIDIA/apex/pull/323#discussion_r287021798. You can try commenting out this check (at your own risk).
Running setup.py install for apex ... error
但是此时我的cuda使用的是10.2
首先要确定我们目前Linux下使用的cuda版本和pytorch中cudatoolkit的版本是否相同
1.如果确实不同
第一:调整cuda,下载合适的cuda版本
第二:重新安装pytorch和适配cuda的cudatoolkit版本
2.如果使用的cuda和cudatoolkit相同,但是安装时输出的nvcc和自己在命令行输入nvcc --version的结果不同的话。
可以手动设置cuda_home位置为当前使用的cuda目录位置
$ git clone https://github.com/NVIDIA/apex
$ cd apex
$ CUDA_HOME=/data/home/cuiaihao/cuda10.2 pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
这样就可以成功安装
running install_egg_info
running egg_info
writing apex.egg-info/PKG-INFO
writing dependency_links to apex.egg-info/dependency_links.txt
writing top-level names to apex.egg-info/top_level.txt
reading manifest file 'apex.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'apex.egg-info/SOURCES.txt'
Copying apex.egg-info to /data/home/cuiaihao/.conda/envs/cascade-stereo/lib/python3.6/site-packages/apex-0.1-py3.6.egg-info
running install_scripts
writing list of installed files to '/tmp/pip-record-vp8iy4e8/install-record.txt'
Running setup.py install for apex ... done
Successfully installed apex-0.1
nice,牛的牛的