[ubuntu 18.04 + RTX 2070] Anaconda3 - 5.2.0 + CUDA10.0 + cuDNN 7.4.1 + bazel 0.17 + tensorRT 5 + Tensorflow(GPU)

2023-08-02 19:56:16

RTX 2070 同样可以在 ubuntu 16.04 + cuda 9.0中使用。Ubuntu18.04可能只支持cuda10.0，在跑开源代码时可能会报一些奇怪的错误，所以建议大家配置 ubuntu16.04 + cuda 9.0。下文还是以ubuntu18.04 + cuda 10.0为例。ubuntu16.04 + cuda 9.0的配置方法大同小异。

如果之前安装的是cuda9.0可以直接用pip安装Tensorflow-GPU，只需要安装Anaconda，virtualenv, CUDA, cuDNN, 之后pip安装tensorflow-gpu；

如果安装的其他版本的CUDA，需要用源码安装，需要将下面的1,2,3,4，(5可选)，之后用源码安装tensorflow-gpu, 并在configure时，根据自己的安装1,2,3,4,5的安装版本等情况自行调整配置选项。

虽然CUDA官网中没有RTX20系列GPU所对应的版本，但是CUDA 10.0 支持Ubuntu18.04 + GPU GEFORCE RTX 2070。为了方便之后学习研究，需要配置：

Anaconda3 5.2.0
CUDA 10.0
cuDNN 7.4.1
Bazel 0.17
TensorRT 5
Tensorflow-gpu

（以下为本人配置方法，由于配置过程中有过错误并重试等情况，以下内容如有错误还请指正~）

（上面列出的各版本都是支持ubuntu18.04 和 RTX 2070的，大家也可以直接参照以上列表，自行安装~）

（安装NVIDIA驱动的方法参考：https://blog.csdn.net/ghw15221836342/article/details/79571559 方法一中，把390替换为410即为RTX 2070 对应版本。）

----------------------------------------------------------------------------------

Ubuntu 18 安装Anaconda3 - 5.2.0

因为tensorflow支持python3.4, 3.5, 3.6,可能还未支持python3.7（python目前最高版本3.7.1 与anaconda3 对应最高python版本3.7.0），为了方便起见，选择安装Anaconda3 - 5.2.0，其对应的python版本为3.6.4. 安装了Anaconda之后，不需要再单独安装python及其各种库了。

anaconda各版本的archive:

https://repo.anaconda.com/archive/

选择下载 Anaconda3-5.2.0-Linux-x86_64.sh

之后到下载目录，

bash Anaconda3-5.2.-Linux-x86_64.sh

可以通过查看

python --version

显示

Python 3.6. :: Anaconda, Inc.

表示安装成功。

查看pip版本：

$ pip --version

pip 10.0. from /home/lsy/anaconda3/lib/python3./site-packages/pip (python 3.6)

--------------（若完成以上，则无需进行下面的安装python的操作了）--------------------------------------------

Ubuntu 18 安装 python 3.6

sudo add-apt-repository ppa:jonathonf/python-3.6

Ubuntu 18 安装 python3.7.1

安装过程参考：

https://blog.csdn.net/jaket5219999/article/details/80894517

wget https://www.python.org/ftp/python/3.7.1/Python-3.7.1.tar.xz && \

    tar -xvf Python-3.7..tar.xz && \

    cd Python-3.7. && \

    ./configure && make && sudo make altinstall

从官网下载https://www.python.org/downloads/release/python-370/

解压并打开指定目录

./configure && make && sudo make altinstall

报错 zipimport.ZipImportError: can‘t decompress data; zlib not available

解决方法：

sudo apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev \

libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev \

xz-utils tk-dev

python2,python3版本切换

参考：https://*.com/questions/43743509/how-to-make-python3-command-run-python-3-6-instead-of-3-5

# 实现 python 链接 python3.

rm /usr/bin/python

ln -s /usr/bin/python3. /usr/bin/python

# 实现 python2 链接 Python2.

rm /usr/bin/python2

ln -s /usr/bin/python2. /usr/bin/python2

# 创建 alias

alias python='/usr/bin/python3.6'

~/.bash_aliases

pip安装

sudo apt-get install python3-pip

这里要用python3，否则匹配的是默认的python2。

--------------------------------------------------------------------------------------------------------------------------------

CUDA 10.0

参考：

https://medium.com/@vitali.usau/install-cuda-10-0-cudnn-7-3-and-build-tensorflow-gpu-from-source-on-ubuntu-18-04-3daf720b83fe

1. 下载CUDA Toolkit : Linux / x86_64 / Ubuntu / 18.04 /deb (local)

https://developer.nvidia.com/cuda-downloads

2. 安装

sudo dpkg -i cuda-repo-ubuntu1804––-local-10.0.–.48_1.–1_amd64.deb

sudo apt-key add /var/cuda-repo-–-local-10.0.–410.48/7fa2af80.pub

sudo apt-get update

sudo apt-get install cuda

3. 添加环境变量

nano ~/.bashrc

末行添加并保存退出。

export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}

4. 检查驱动版本和CUDA toolkit

cat /proc/driver/nvidia/version

nvcc -V

5. (Optional) Build CUDA samples and run it.

cd /usr/local/cuda-10.0/samples

sudo make

这需要等一段时间。完成后，可以进入资源中，执行命令查看结果。

cd /usr/local/cuda-10.0/samples/bin/x86_64/linux/release

./deviceQuery

./bandwidthTest

------------------------------------------------------------------

cuDNN v7.4.1 for CUDA 10.0

1. 下载cuDNN Library for Linux

https://developer.nvidia.com/rdp/cudnn-download

（下载前需要在NVIDIA注册账号：https://developer.nvidia.com/）

2. 解压下载好的文件，解压后cuDNN的文件夹名称为cuda

3. 将cuDNN内容复制到CUDA安装文件中，即将cuDNN解压后的cuda文件中内容复制到/usr/local的CUDA中。

$ sudo cp cuda/include/cudnn.h    /usr/local/cuda/include

$ sudo cp cuda/lib64/libcudnn*    /usr/local/cuda/lib64

$ sudo chmod a+r /usr/local/cuda/include/cudnn.h   /usr/local/cuda/lib64/libcudnn*

（该方法参考：https://blog.csdn.net/u010801439/article/details/80483036）

------------------------------------------------------------------------

NCCL v2.3.7

只有需要用源码安装tensorflow时才需要装这个哦～用pip的可以跳过

安装方法参考：https://blog.csdn.net/zuyuhuo6777/article/details/81450258

1. 下载

https://developer.nvidia.com/nccl/nccl-download

选择Local installers (x86)中的Local installer for Ubuntu 18.04
2. 安装
进入下载目录，安装本地NCCL存储库，更新APY数据库，安装libnccl2与APT打包。此外，若需要使用NCCL编译应用程序，则可以安装libnccl-dev的包裹。

$ sudo dpkg -i nccl-repo-ubuntu1804-2.3.-ga-cuda10.0_1-1_amd64.deb

$ sudo apt update

$ sudo apt install libnccl2 libnccl-dev

------------------------------------------------------------------------

方便起见，请直接下载Bazel 0.17

（早先安装了0.19，--config == cuda 并不支持0.17以上版本，不清楚使用0.19对后续步骤有无影响，所以，卸载了0.19，重新安装了0.17。卸载方法：whereis bazel,找到bazel目录，直接rm -rf <path>即可。）

Bazel 0.19.2

只有需要用源码安装tensorflow时才需要装这个哦～用pip的可以跳过

官网提供了多种安装方法，

https://docs.bazel.build/versions/master/install-ubuntu.html#install-with-installer-ubuntu

以下使用了Installing using binary installer的方法。

1. 下载需要的包

$ sudo apt-get install pkg-config zip g++ zlib1g-dev unzip python

2. 下载Bazel

https://github.com/bazelbuild/bazel/releases

选择安装了bazel-0.19.2-installer-linux-x86_64.sh

3. Run the installer

$ chmod +x bazel-<version>-installer-linux-x86_64.sh

$ ./bazel-<version>-installer-linux-x86_64.sh --user

4. 设置环境

$ nano ~/.bashrc

末行添加并保存退出

export PATH="$PATH:$HOME/bin"

执行以生效：

$ source ~/.bashrc

5. 检查是否安装成功

$ bazel version

--------------------------------------------

TensotRT 5.0.2.6

只有需要用源码安装tensorflow时才需要装这个哦～用pip的可以跳过。用源码安装，该项也可以不装，看自己需求。如果安装，在源码编译，configure时记得选择和自己安装匹配的选项哦～

for Ubuntu 1804 and CUDA 10.0

1. 下载

https://developer.nvidia.com/nvidia-tensorrt-5x-download

选择了Debian and RPM Install Package:

TensorRT 5.0.2.6 GA for Ubuntu 1804 and CUDA 10.0 DEB local repo packages

2. 安装，参考官方文档：

https://docs.nvidia.com/deeplearning/sdk/tensorrt-install-guide/index.html#downloading

$ sudo dpkg -i nv-tensorrt-repo-ubuntu1804-cuda10.-trt5.0.2.-ga-20181009_1-1_amd64.deb

$ sudo apt-key add /var/nv-tensorrt-repo-cuda10.-trt5.0.2.-ga-/7fa2af80.pub

$ sudo apt-get update

$ sudo apt-get install tensorrt

之前Anaconda3 中python是3.6版本，下面直接写python就好，不用改为python3.

$ sudo apt-get install python-libnvinfer-dev

安装后显示：

Setting up python-libnvinfer-dev (5.0.-+cuda10.) ...

若计划通过tensorflow使用tensorRT

$ sudo apt-get install uff-converter-tf

安装后显示：

Setting up graphsurgeon-tf (5.0.-+cuda10.) ...

Setting up uff-converter-tf (5.0.-+cuda10.) ...

3. 检查我们的安装结果：

$ dpkg -l | grep TensorRT

ii  graphsurgeon-tf                                             5.0.-+cuda10.                    amd64        GraphSurgeon for TensorRT package

ii  libnvinfer-dev                                              5.0.-+cuda10.                    amd64        TensorRT development libraries and headers

ii  libnvinfer-samples                                          5.0.-+cuda10.                    all          TensorRT samples and documentation

ii  libnvinfer5                                                 5.0.-+cuda10.                    amd64        TensorRT runtime libraries

ii  python-libnvinfer                                           5.0.-+cuda10.                    amd64        Python bindings for TensorRT

ii  python-libnvinfer-dev                                       5.0.-+cuda10.                    amd64        Python development package for TensorRT

ii  tensorrt                                                    5.0.2.6-+cuda10.                  amd64        Meta package of TensorRT

ii  uff-converter-tf                                            5.0.-+cuda10.                    amd64        UFF converter for TensorRT package

--------------------------------------------------------

Tensorflow

推荐两种安装方式：1.在docker中安装；2. 在virtualenv中安装。一般用的多一些。

（1）docker中：

1. Docker的安装：

https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-18-04

2. Install nvidia-docker：

https://github.com/NVIDIA/nvidia-docker

3. Downloads TensorFlow release images to your machine:

$ docker pull tensorflow/tensorflow:latest-devel-gpu

（2）virtualenv中：

sudo apt update

sudo apt install python-dev python-pip

sudo pip install -U virtualenv  # system-wide install

virtualenv --system-site-packages -p python3 ./venv

source ./venv/bin/activate

(venv) $ pip install --upgrade pip

(venv) $ pip list

在(venv)中继续安装tensorflow.

(1) Installed by pip: 如果之前安装的是cuda9.0可以直接用pip安装，否则，需要用源码安装，见(2)

pip install tensorflow-gpu==1.12

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

Solution: add the following to .bashrc

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64/

(2) Else: Build from source

这里注意./configure时候，默认cuda版本是9.0，我们改为 10.0.

安装完毕后可以退出venv:

(venv) $ deactivate # don't exit until you're done using TensorFlow

------------------------------------------------------------------------------

测试tensorflow-gpu在docker中是否能顺利运行：

$ sudo docker run --runtime=nvidia -it --rm tensorflow/tensorflow:latest-gpu \

>    python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"

[sudo] password for lsy:

Unable to find image 'tensorflow/tensorflow:latest-gpu' locally

latest-gpu: Pulling from tensorflow/tensorflow

18d680d61657: Already exists

0addb6fece63: Already exists

78e58219b215: Already exists

eb6959a66df2: Already exists

e3eb30fe4844: Already exists

852c9b7a4425: Already exists

0a298bf31111: Already exists

4b34ad03a386: Pull complete

ea4e8d636cf7: Pull complete

e641906af026: Pull complete

af41a77e326c: Pull complete

56234dc44f16: Pull complete

33999852f515: Pull complete

11679b84da5e: Pull complete

231eb8ba046b: Pull complete

7d894676fbc1: Pull complete

Digest: sha256:847690afb29977920dbdbcf64a8669a2aaa0a202844fe80ea5cb524ede9f0a0b

Status: Downloaded newer image for tensorflow/tensorflow:latest-gpu

-- ::05.315151: I tensorflow/core/platform/cpu_feature_guard.cc:] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

-- ::05.490068: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:] successful NUMA node read from SysFS had negative value (-), but there must be at least one NUMA node, so returning NUMA node zero

-- ::05.490510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:] Found device  with properties:

name: GeForce RTX  major:  minor:  memoryClockRate(GHz): 1.725

pciBusID: ::00.0

totalMemory: .76GiB freeMemory: .09GiB

-- ::05.490528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:] Adding visible gpu devices:

-- ::05.727215: I tensorflow/core/common_runtime/gpu/gpu_device.cc:] Device interconnect StreamExecutor with strength  edge matrix:

-- ::05.727251: I tensorflow/core/common_runtime/gpu/gpu_device.cc:]

-- ::05.727257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:] :   N

-- ::05.727423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:] Created TensorFlow device (/job:localhost/replica:/task:/device:GPU: with  MB memory) -> physical GPU (device: , name: GeForce RTX , pci bus id: ::00.0, compute capability: 7.5)

tf.Tensor(-568.0144, shape=(), dtype=float32)

---------------------------------------------------------

=======================================

[支付宝] 您愿意送我一个小礼物吗？O(∩_∩)O

码农公寓