1.下载所需要的软件
2.安装NVIDIA驱动。
一般有两种方法:1)一种方法是利用“软件和更新”来安装,依次选择 系统设置->软件和更新->附加驱动->选择最新的驱动->应用更改
安装时可能遇到的问题:点击完应用更改一段时间后并没有成功安装,再次点击却出现闪退的现象,这个问题困扰了我一晚上,最后发现是因为依赖的问题,通过在终端输入以下命令:sudo apt-get install -f 后 再次安装问题就解决了
2)方法二就是下载安装包后通过命令行安装,因为这个比较麻烦,我没有尝试,看网上其他教程说需要关了xwindows安装才行。
3.安装cuda7.5
(1)在终端cd到所下载的安装包所在的目录,输入sh cuda_7.5.18_linux.run --override
跑起来后一路空格完那些协议,然后输入accept,除了有一个是让安装驱动的选择N外,其他的一路Y下去
(2)安装cudnn(这个是GPU加速用的)
解压下载好的安装包,在终端输入以下命令:
sudo cp cudnn.h /usr/local/cuda/include/
cd ~/cuda/lib64
sudo cp lib* /usr/local/cuda/lib64/
cd /usr/local/cuda/lib64/
sudo rm -rf libcudnn.so libcudnn.so.4
sudo ln -s libcudnn.so.4.0.7 libcudnn.so.4
sudo ln -s libcudnn.so.4 libcudnn.so
然后设置环境变量
sudo gedit /etc/profile
在末尾加入 export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
保存之后创建链接文件 sudo vim /etc/ld.so.conf.d/cuda.conf
键盘按i进入编辑状态,添加文字 /usr/local/cuda/lib64
然后按esc,输入:wq保存退出。
终端下接着输入 sudo ldconfig
使链接生效
3.生成Cuda Sample测试
(1)首先在此之前先把需要的依赖包都安装好,为接下来make caffe做准备 sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
sudo apt-get install --no-install-recommends libboost-all-dev
sudo apt-get install libatlas-base-dev
sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev
(2)更改gcc版本(我一开始没有更改,直接make没有报错,但make玩后测试出错,所以这里最好是改一下,如果报报错“unsupported GNU version! gcc versions later than 4.9 are not supported!”错误,那就一定得改了)原因就是这个cuda不支持gcc5.0以上
cd /usr/local/cuda-7.5/include
cp host_config.h host_config.h.bak
sudo gedit host_config.h
Ctrl+F寻找有”4.9”的地方,应该是只有一处,在其上方的 #if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 9)
将两个4改成5,保存退出,继续 cd /home/gomee/NVIDIA_CUDA-7.5_Samples
(3)正式开始make example了
终端输入 make all -j4 (j4代表开多少个线程,一般你的电脑是几核的就开几个)
这就应该开始make了,此处大约有4,5分钟。完成之后 cd /home/gomee/NVIDIA_CUDA-7.5_Samples/bin/x86_64/linux/realease
./deviceQuery
如果出现如下信息
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GT 650M"
CUDA Driver Version / Runtime Version 8.0 / 7.5
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 1999 MBytes (2096300032 bytes)
( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores
GPU Max Clock rate: 885 MHz (0.88 GHz)
Memory Clock rate: 2000 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = GeForce GT 650M
Result = PASS
证明cuda安装成功。