Linux服务器多cuda如何安装apex

设置好linux环境变量

首先实现多个版本cuda的切换

例如我安装的了两个版本的cuda,分别为cuda9.0和cuda10.2。
两个cuda的目录分别为

#cuda9
/data/home/cuiaihao/cuda9
#cuda10.2
/data/home/cuiaihao/cuda10

当我在使用cuda9时,我的环境变量中为

export PATH=/data/home/cuiaihao/cuda9/bin:$PATH
export LD_LIBRARY_PATH=/data/home/cuiaihao/cuda9/lib64:$LD_LIBRARY_PATH

#export PATH=/data/home/cuiaihao/cuda10.2/bin${PATH:+:${PATH}}
#export LD_LIBRARY_PATH=/data/home/cuiaihao/cuda10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

当我在使用cuda10.2时,我的环境变量为

#export PATH=/data/home/cuiaihao/cuda9/bin:$PATH
#export LD_LIBRARY_PATH=/data/home/cuiaihao/cuda9/lib64:$LD_LIBRARY_PATH

export PATH=/data/home/cuiaihao/cuda10.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/data/home/cuiaihao/cuda10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

环境变量可以通过vim进行编辑,

vim ~/.bashrc

编辑结束后运行,使环境变量生效

source ~/.bashrc

修改成功后可以通过nvcc --version来查看当前使用的cuda版本

nvcc --version

比如我的设置好cuda10.2后,nvcc --version输出为

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

下载apex

标准安装

为了性能和完整的功能,建议通过CUDA和c++扩展来安装Apex
 
$ git clone https://github.com/NVIDIA/apex
$ cd apex
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
 
Apex 同样支持 Python-only build (required with Pytorch 0.4) via
 
$ pip install -v --no-cache-dir ./

我遇到的问题

如果你在安装apex时报错,比如出现

RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 10.2.

after this
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
i get this error

torch.version = 1.6.0

/tmp/pip-req-build-l3l15eo8/setup.py:67: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
  warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")

Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
from /data/home/cuiaihao/cuda9/bin

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-req-build-l3l15eo8/setup.py", line 171, in <module>
    check_cuda_torch_binary_vs_bare_metal(torch.utils.cpp_extension.CUDA_HOME)
  File "/tmp/pip-req-build-l3l15eo8/setup.py", line 106, in check_cuda_torch_binary_vs_bare_metal
    "https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  "
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries.  Pytorch binaries were compiled with Cuda 10.2.
In some cases, a minor-version mismatch will not cause later errors:  https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  You can try commenting out this check (at your own risk).
Running setup.py install for apex ... error

但是此时我的cuda使用的是10.2

首先要确定我们目前Linux下使用的cuda版本和pytorch中cudatoolkit的版本是否相同

1.如果确实不同
第一:调整cuda,下载合适的cuda版本
第二:重新安装pytorch和适配cuda的cudatoolkit版本

2.如果使用的cuda和cudatoolkit相同,但是安装时输出的nvcc和自己在命令行输入nvcc --version的结果不同的话。
可以手动设置cuda_home位置为当前使用的cuda目录位置

$ git clone https://github.com/NVIDIA/apex
$ cd apex
$ CUDA_HOME=/data/home/cuiaihao/cuda10.2 pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

这样就可以成功安装

	running install_egg_info
    running egg_info
    writing apex.egg-info/PKG-INFO
    writing dependency_links to apex.egg-info/dependency_links.txt
    writing top-level names to apex.egg-info/top_level.txt
    reading manifest file 'apex.egg-info/SOURCES.txt'
    adding license file 'LICENSE'
    writing manifest file 'apex.egg-info/SOURCES.txt'
    Copying apex.egg-info to /data/home/cuiaihao/.conda/envs/cascade-stereo/lib/python3.6/site-packages/apex-0.1-py3.6.egg-info
    running install_scripts
    writing list of installed files to '/tmp/pip-record-vp8iy4e8/install-record.txt'
    Running setup.py install for apex ... done
Successfully installed apex-0.1

nice,牛的牛的

上一篇:SQL查询单表数据(一)


下一篇:Oracle APEX开发搭建二《rhel8 oracle 19c 19.3》