最近在开fastai提供的AI教程,刚好自己的电脑上有nvidia独显(GPU),先前因为耗电温度高就切换到了内置显卡.是时候实现你的价值了nvidia,出来吧小宝贝.执行召唤咒语:nvidia-settings后傻眼了:
ERROR: NVIDIA driver is not loaded
ERROR: Unable to load info from any available system
(nvidia-settings:317): GLib-GObject-CRITICAL **: 06:42:43.821: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
** Message: 06:42:43.855: PRIME: No offloading required. Abort
** Message: 06:42:43.855: PRIME: is it supported? no
执行nvidia-smi也是报同样的错误,驱动没了,难道我上次切换显卡时直接把驱动也卸载了,什么时候残忍的斩草还除根了?实在想不起,还是先重新安装下吧:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-460 #此处要根据上面查询到的版本适当更改
sudo apt-get install mesa-common-dev
sudo apt-get install freeglut3-dev
安装挺顺利的,确认下安装日志也没发现什么问题:
tianlang@tianlang:spark$ sudo apt-get install nvidia-driver-460
正在读取软件包列表... 完成
正在分析软件包的依赖关系树
正在读取状态信息... 完成
将会同时安装下列软件:
libnvidia-cfg1-460 libnvidia-compute-460
libnvidia-compute-460:i386 libnvidia-decode-460
libnvidia-decode-460:i386 libnvidia-encode-460
libnvidia-encode-460:i386 libnvidia-extra-460
libnvidia-fbc1-460 libnvidia-fbc1-460:i386 libnvidia-gl-460
libnvidia-gl-460:i386 libnvidia-ifr1-460
libnvidia-ifr1-460:i386 nvidia-compute-utils-460
nvidia-dkms-460 nvidia-kernel-common-460
nvidia-kernel-source-460 nvidia-utils-460
xserver-xorg-video-nvidia-460
下列软件包将被升级:
libnvidia-cfg1-460 libnvidia-compute-460
libnvidia-compute-460:i386 libnvidia-decode-460
libnvidia-decode-460:i386 libnvidia-encode-460
libnvidia-encode-460:i386 libnvidia-extra-460
libnvidia-fbc1-460 libnvidia-fbc1-460:i386 libnvidia-gl-460
libnvidia-gl-460:i386 libnvidia-ifr1-460
libnvidia-ifr1-460:i386 nvidia-compute-utils-460
nvidia-dkms-460 nvidia-driver-460 nvidia-kernel-common-460
nvidia-kernel-source-460 nvidia-utils-460
xserver-xorg-video-nvidia-460
升级了 21 个软件包,新安装了 0 个软件包,要卸载 0 个软件包,有 4 个软件包未被升级。
需要下载 175 MB 的归档。
解压缩后会消耗 156 kB 的额外空间。
您希望继续执行吗? [Y/n] Y
获取:1 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic/main amd64 nvidia-driver-460 amd64 460.67-0ubuntu0~0.18.04.1 [433 kB]
...
已下载 175 MB,耗时 11分 55秒 (245 kB/s)
(正在读取数据库 ... 系统当前共安装有 296611 个文件和目录。)
正准备解包 .../00-nvidia-driver-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...
正在将 nvidia-driver-460 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../01-libnvidia-gl-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...
...
Removing all DKMS Modules
Done.
正在将 nvidia-dkms-460 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../04-nvidia-kernel-source-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...
正在将 nvidia-kernel-source-460 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../05-nvidia-kernel-common-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...
正在将 nvidia-kernel-common-460 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../06-libnvidia-decode-460_460.67-0ubuntu0~0.18.04.1_i386.deb ...
正在反配置 libnvidia-decode-460:amd64 (460.56-0ubuntu0.18.04.1) ...
正在将 libnvidia-decode-460:i386 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../07-libnvidia-decode-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...
正在将 libnvidia-decode-460:amd64 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../08-libnvidia-compute-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...
正在反配置 libnvidia-compute-460:i386 (460.56-0ubuntu0.18.04.1) ...
正在将 libnvidia-compute-460:amd64 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../09-libnvidia-compute-460_460.67-0ubuntu0~0.18.04.1_i386.deb ...
正在将 libnvidia-compute-460:i386 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../10-libnvidia-extra-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...
正在将 libnvidia-extra-460:amd64 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../11-nvidia-compute-utils-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...
正在将 nvidia-compute-utils-460 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../12-libnvidia-encode-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...
正在反配置 libnvidia-encode-460:i386 (460.56-0ubuntu0.18.04.1) ...
正在将 libnvidia-encode-460:amd64 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../13-libnvidia-encode-460_460.67-0ubuntu0~0.18.04.1_i386.deb ...
正在将 libnvidia-encode-460:i386 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../14-nvidia-utils-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...
正在将 nvidia-utils-460 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../15-xserver-xorg-video-nvidia-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...
正在将 xserver-xorg-video-nvidia-460 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../16-libnvidia-ifr1-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...
正在反配置 libnvidia-ifr1-460:i386 (460.56-0ubuntu0.18.04.1) ...
正在将 libnvidia-ifr1-460:amd64 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../17-libnvidia-ifr1-460_460.67-0ubuntu0~0.18.04.1_i386.deb ...
正在将 libnvidia-ifr1-460:i386 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../18-libnvidia-fbc1-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...
正在反配置 libnvidia-fbc1-460:i386 (460.56-0ubuntu0.18.04.1) ...
正在将 libnvidia-fbc1-460:amd64 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../19-libnvidia-fbc1-460_460.67-0ubuntu0~0.18.04.1_i386.deb ...
正在将 libnvidia-fbc1-460:i386 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正准备解包 .../20-libnvidia-cfg1-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...
正在将 libnvidia-cfg1-460:amd64 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...
正在设置 libnvidia-extra-460:amd64 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 libnvidia-fbc1-460:i386 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 libnvidia-fbc1-460:amd64 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 libnvidia-gl-460:i386 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 libnvidia-gl-460:amd64 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 libnvidia-ifr1-460:amd64 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 libnvidia-ifr1-460:i386 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 libnvidia-compute-460:amd64 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 libnvidia-compute-460:i386 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 nvidia-kernel-source-460 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 nvidia-utils-460 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 nvidia-kernel-common-460 (460.67-0ubuntu0~0.18.04.1) ...
update-initramfs: deferring update (trigger activated)
正在设置 libnvidia-cfg1-460:amd64 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 libnvidia-decode-460:amd64 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 libnvidia-decode-460:i386 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 nvidia-compute-utils-460 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 libnvidia-encode-460:amd64 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 libnvidia-encode-460:i386 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 xserver-xorg-video-nvidia-460 (460.67-0ubuntu0~0.18.04.1) ...
正在设置 nvidia-dkms-460 (460.67-0ubuntu0~0.18.04.1) ...
update-initramfs: deferring update (trigger activated)
INFO:Enable nvidia
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/dell_latitude
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/lenovo_thinkpad
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here
Loading new nvidia-460.67 DKMS files...
Building for 4.15.0-141-generic
Building for architecture x86_64
Building initial module for 4.15.0-141-generic
Secure Boot not enabled on this system.
Done.
nvidia:
Running module version sanity check.
- Original module
- This kernel never originally had a module by this name
- Installation
- Installing to /lib/modules/4.15.0-141-generic/extra/
nvidia-modeset.ko:
Running module version sanity check.
Good news! Module version 460.67 for nvidia-modeset.ko
exactly matches what is already found in kernel 4.15.0-141-generic.
DKMS will not replace this module.
You may override by specifying --force.
nvidia-drm.ko:
Running module version sanity check.
- Original module
- This kernel never originally had a module by this name
- Installation
- Installing to /lib/modules/4.15.0-141-generic/extra/
nvidia-uvm.ko:
Running module version sanity check.
Good news! Module version for nvidia-uvm.ko
exactly matches what is already found in kernel 4.15.0-141-generic.
DKMS will not replace this module.
You may override by specifying --force.
depmod...
DKMS: install completed.
...
为了安全期间又重启了下电脑,再次召唤nvidia,还是熟悉的配方熟悉的味道.
这就有点诡异了,找gpu管理员了解下情况吧:
spark$ sudo gpu-manager
last_boot_file: /var/lib/ubuntu-drivers-common/last_gfx_boot
new_boot_file: /var/lib/ubuntu-drivers-common/last_gfx_boot
can't access /run/u-d-c-nvidia-was-loaded file
can't access /opt/amdgpu-pro/bin/amdgpu-pro-px
Looking for nvidia modules in /lib/modules/4.15.0-141-generic/updates/dkms
Error: can't open /lib/modules/4.15.0-141-generic/updates/dkms
Looking for amdgpu modules in /lib/modules/4.15.0-141-generic/updates/dkms
Error: can't open /lib/modules/4.15.0-141-generic/updates/dkms
Is nvidia loaded? no
Was nvidia unloaded? no
Is nvidia blacklisted? yes
Is intel loaded? yes
Is radeon loaded? no
Is radeon blacklisted? no
Is amdgpu loaded? no
Is amdgpu blacklisted? no
Is amdgpu versioned? no
Is amdgpu pro stack? no
Is nouveau loaded? no
Is nouveau blacklisted? yes
Is nvidia kernel module available? no
Is amdgpu kernel module available? no
Vendor/Device Id: 8086:191b
BusID "PCI:0@0:2:0"
Is boot vga? yes
Vendor/Device Id: 10de:139a
BusID "PCI:1@0:0:0"
can't open /sys/bus/pci/devices/0000:01:00.0/boot_vga
Is boot vga? no
Error: can't access /sys/bus/pci/devices/0000:01:00.0/driver
The device is not bound to any driver.
can't open /sys/bus/pci/devices/0000:01:00.0/boot_vga
can't access /etc/u-d-c-nvidia-runtimepm-override file
can't open /sys/module/nvidia/version
Warning: cannot check the NVIDIA driver major version
Support for runtimepm not detected.
You can override this check at your own risk by creating the /etc/u-d-c-nvidia-runtimepm-override file.
Is nvidia runtime pm supported for "0x139a"? no
Checking power status in /proc/driver/nvidia/gpus/0000:01:00.0/power
Error while opening /proc/driver/nvidia/gpus/0000:01:00.0/power
Is nvidia runtime pm enabled for "0x139a"? no
Skipping "/dev/dri/card0", driven by "i915"
Skipping "/dev/dri/card0", driven by "i915"
Skipping "/dev/dri/card0", driven by "i915"
Found "/dev/dri/card0", driven by "i915"
output 0:
card0-eDP-1
Number of connected outputs for /dev/dri/card0: 1
Does it require offloading? no
last cards number = 2
Has amd? no
Has intel? yes
Has nvidia? yes
How many cards? 2
Has the system changed? No
Intel IGP detected
Desktop system detected
or laptop with open drivers
Nothing to do
GPU管理员一通报告,我就注意到了一条可能有用的信息:
Is nvidia blacklisted? yes
屏蔽啦?屏蔽这活应该是modprobe干的,那就去检查下modprobe吧:
$ ls /lib/modprobe.d/
aliases.conf
blacklist_linux_4.15.0-137-generic.conf
blacklist_linux_4.15.0-141-generic.conf
blacklist-nvidia.conf
fbdev-blacklist.conf
nvidia-graphics-drivers.conf
systemd.conf
看到blacklist-nvidia.conf文件了吧,人赃俱获还真是modprobe干的.就这么顺利吗?现实情况是我第一次检查的/etc/modprobe.d文件夹,没有发现可疑文件,就放过modprobe了.好一通搜索无果后才有找到另一个巢穴/lib/modprobe.d文件夹,哎呦这小子啥时候还狡兔三窟了.
费了这么大劲找到了屏蔽nvidia gpu的配置文件,不得拉出来示个众:
cat /lib/modprobe.d/blacklist-nvidia.conf
# Do not modify
# This file was generated by nvidia-prime
blacklist nvidia
blacklist nvidia-drm
blacklist nvidia-modeset
alias nvidia off
alias nvidia-drm off
alias nvidia-modeset off
从注释信息看,这文件是nvidia-prime生成了,还真是干了事后留签名,敢干敢当.
删了吧:
rm blacklist-nvidia.conf
这里注意只删blacklist-nvidia.conf文件就可以了,不要把nvidia-graphics-drivers.conf文件也删了,虽然名字里都带nvidia.
安全期间再执行下:
sudo update-initramfs -u
重启.
这下终于可以成功召唤出这几年随着机器学习声名鹊起的NVIDIA了:
tianlang@tianlang:spark$ nvidia-smi
Sat Mar 27 07:27:19 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67 Driver Version: 460.67 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 950M Off | 00000000:01:00.0 Off | N/A |
| N/A 49C P0 N/A / N/A | 0MiB / 2004MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+