nvidia英伟达GPU:nvidia driver is not loaded

nvidia英伟达GPU:nvidia driver is not loaded

最近在开fastai提供的AI教程,刚好自己的电脑上有nvidia独显(GPU),先前因为耗电温度高就切换到了内置显卡.是时候实现你的价值了nvidia,出来吧小宝贝.执行召唤咒语:nvidia-settings后傻眼了:

ERROR: NVIDIA driver is not loaded

ERROR: Unable to load info from any available system

(nvidia-settings:317): GLib-GObject-CRITICAL **: 06:42:43.821: g_object_unref: assertion 'G_IS_OBJECT (object)' failed

** Message: 06:42:43.855: PRIME: No offloading required. Abort

** Message: 06:42:43.855: PRIME: is it supported? no

执行nvidia-smi也是报同样的错误,驱动没了,难道我上次切换显卡时直接把驱动也卸载了,什么时候残忍的斩草还除根了?实在想不起,还是先重新安装下吧:

sudo add-apt-repository ppa:graphics-drivers/ppa  
sudo apt-get update  
sudo apt-get install nvidia-460 #此处要根据上面查询到的版本适当更改
sudo apt-get install mesa-common-dev  
sudo apt-get install freeglut3-dev

安装挺顺利的,确认下安装日志也没发现什么问题:

tianlang@tianlang:spark$ sudo apt-get install nvidia-driver-460

正在读取软件包列表... 完成

正在分析软件包的依赖关系树

正在读取状态信息... 完成

将会同时安装下列软件:

libnvidia-cfg1-460 libnvidia-compute-460

libnvidia-compute-460:i386 libnvidia-decode-460

libnvidia-decode-460:i386 libnvidia-encode-460

libnvidia-encode-460:i386 libnvidia-extra-460

libnvidia-fbc1-460 libnvidia-fbc1-460:i386 libnvidia-gl-460

libnvidia-gl-460:i386 libnvidia-ifr1-460

libnvidia-ifr1-460:i386 nvidia-compute-utils-460

nvidia-dkms-460 nvidia-kernel-common-460

nvidia-kernel-source-460 nvidia-utils-460

xserver-xorg-video-nvidia-460

下列软件包将被升级:

libnvidia-cfg1-460 libnvidia-compute-460

libnvidia-compute-460:i386 libnvidia-decode-460

libnvidia-decode-460:i386 libnvidia-encode-460

libnvidia-encode-460:i386 libnvidia-extra-460

libnvidia-fbc1-460 libnvidia-fbc1-460:i386 libnvidia-gl-460

libnvidia-gl-460:i386 libnvidia-ifr1-460

libnvidia-ifr1-460:i386 nvidia-compute-utils-460

nvidia-dkms-460 nvidia-driver-460 nvidia-kernel-common-460

nvidia-kernel-source-460 nvidia-utils-460

xserver-xorg-video-nvidia-460

升级了 21 个软件包,新安装了 0 个软件包,要卸载 0 个软件包,有 4 个软件包未被升级。

需要下载 175 MB 的归档。

解压缩后会消耗 156 kB 的额外空间。

您希望继续执行吗? [Y/n] Y

获取:1 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic/main amd64 nvidia-driver-460 amd64 460.67-0ubuntu0~0.18.04.1 [433 kB]

...

已下载 175 MB,耗时 11分 55秒 (245 kB/s)

(正在读取数据库 ... 系统当前共安装有 296611 个文件和目录。)

正准备解包 .../00-nvidia-driver-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...

正在将 nvidia-driver-460 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../01-libnvidia-gl-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...

...

Removing all DKMS Modules

Done.

正在将 nvidia-dkms-460 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../04-nvidia-kernel-source-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...

正在将 nvidia-kernel-source-460 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../05-nvidia-kernel-common-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...

正在将 nvidia-kernel-common-460 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../06-libnvidia-decode-460_460.67-0ubuntu0~0.18.04.1_i386.deb ...

正在反配置 libnvidia-decode-460:amd64 (460.56-0ubuntu0.18.04.1) ...

正在将 libnvidia-decode-460:i386 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../07-libnvidia-decode-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...

正在将 libnvidia-decode-460:amd64 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../08-libnvidia-compute-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...

正在反配置 libnvidia-compute-460:i386 (460.56-0ubuntu0.18.04.1) ...

正在将 libnvidia-compute-460:amd64 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../09-libnvidia-compute-460_460.67-0ubuntu0~0.18.04.1_i386.deb ...

正在将 libnvidia-compute-460:i386 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../10-libnvidia-extra-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...

正在将 libnvidia-extra-460:amd64 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../11-nvidia-compute-utils-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...

正在将 nvidia-compute-utils-460 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../12-libnvidia-encode-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...

正在反配置 libnvidia-encode-460:i386 (460.56-0ubuntu0.18.04.1) ...

正在将 libnvidia-encode-460:amd64 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../13-libnvidia-encode-460_460.67-0ubuntu0~0.18.04.1_i386.deb ...

正在将 libnvidia-encode-460:i386 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../14-nvidia-utils-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...

正在将 nvidia-utils-460 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../15-xserver-xorg-video-nvidia-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...

正在将 xserver-xorg-video-nvidia-460 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../16-libnvidia-ifr1-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...

正在反配置 libnvidia-ifr1-460:i386 (460.56-0ubuntu0.18.04.1) ...

正在将 libnvidia-ifr1-460:amd64 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../17-libnvidia-ifr1-460_460.67-0ubuntu0~0.18.04.1_i386.deb ...

正在将 libnvidia-ifr1-460:i386 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../18-libnvidia-fbc1-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...

正在反配置 libnvidia-fbc1-460:i386 (460.56-0ubuntu0.18.04.1) ...

正在将 libnvidia-fbc1-460:amd64 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../19-libnvidia-fbc1-460_460.67-0ubuntu0~0.18.04.1_i386.deb ...

正在将 libnvidia-fbc1-460:i386 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正准备解包 .../20-libnvidia-cfg1-460_460.67-0ubuntu0~0.18.04.1_amd64.deb ...

正在将 libnvidia-cfg1-460:amd64 (460.67-0ubuntu0~0.18.04.1) 解包到 (460.56-0ubuntu0.18.04.1) 上 ...

正在设置 libnvidia-extra-460:amd64 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 libnvidia-fbc1-460:i386 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 libnvidia-fbc1-460:amd64 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 libnvidia-gl-460:i386 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 libnvidia-gl-460:amd64 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 libnvidia-ifr1-460:amd64 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 libnvidia-ifr1-460:i386 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 libnvidia-compute-460:amd64 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 libnvidia-compute-460:i386 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 nvidia-kernel-source-460 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 nvidia-utils-460 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 nvidia-kernel-common-460 (460.67-0ubuntu0~0.18.04.1) ...

update-initramfs: deferring update (trigger activated)

正在设置 libnvidia-cfg1-460:amd64 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 libnvidia-decode-460:amd64 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 libnvidia-decode-460:i386 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 nvidia-compute-utils-460 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 libnvidia-encode-460:amd64 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 libnvidia-encode-460:i386 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 xserver-xorg-video-nvidia-460 (460.67-0ubuntu0~0.18.04.1) ...

正在设置 nvidia-dkms-460 (460.67-0ubuntu0~0.18.04.1) ...

update-initramfs: deferring update (trigger activated)

INFO:Enable nvidia

DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/dell_latitude

DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/lenovo_thinkpad

DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here

Loading new nvidia-460.67 DKMS files...

Building for 4.15.0-141-generic

Building for architecture x86_64

Building initial module for 4.15.0-141-generic

Secure Boot not enabled on this system.

Done.

nvidia:

Running module version sanity check.

- Original module

- This kernel never originally had a module by this name

- Installation

- Installing to /lib/modules/4.15.0-141-generic/extra/

nvidia-modeset.ko:

Running module version sanity check.

Good news! Module version 460.67 for nvidia-modeset.ko

exactly matches what is already found in kernel 4.15.0-141-generic.

DKMS will not replace this module.

You may override by specifying --force.

nvidia-drm.ko:

Running module version sanity check.

- Original module

- This kernel never originally had a module by this name

- Installation

- Installing to /lib/modules/4.15.0-141-generic/extra/

nvidia-uvm.ko:

Running module version sanity check.

Good news! Module version for nvidia-uvm.ko

exactly matches what is already found in kernel 4.15.0-141-generic.

DKMS will not replace this module.

You may override by specifying --force.

depmod...

DKMS: install completed.

...

为了安全期间又重启了下电脑,再次召唤nvidia,还是熟悉的配方熟悉的味道.

这就有点诡异了,找gpu管理员了解下情况吧:

spark$ sudo gpu-manager

last_boot_file: /var/lib/ubuntu-drivers-common/last_gfx_boot

new_boot_file: /var/lib/ubuntu-drivers-common/last_gfx_boot

can't access /run/u-d-c-nvidia-was-loaded file

can't access /opt/amdgpu-pro/bin/amdgpu-pro-px

Looking for nvidia modules in /lib/modules/4.15.0-141-generic/updates/dkms

Error: can't open /lib/modules/4.15.0-141-generic/updates/dkms

Looking for amdgpu modules in /lib/modules/4.15.0-141-generic/updates/dkms

Error: can't open /lib/modules/4.15.0-141-generic/updates/dkms

Is nvidia loaded? no

Was nvidia unloaded? no

Is nvidia blacklisted? yes

Is intel loaded? yes

Is radeon loaded? no

Is radeon blacklisted? no

Is amdgpu loaded? no

Is amdgpu blacklisted? no

Is amdgpu versioned? no

Is amdgpu pro stack? no

Is nouveau loaded? no

Is nouveau blacklisted? yes

Is nvidia kernel module available? no

Is amdgpu kernel module available? no

Vendor/Device Id: 8086:191b

BusID "PCI:0@0:2:0"

Is boot vga? yes

Vendor/Device Id: 10de:139a

BusID "PCI:1@0:0:0"

can't open /sys/bus/pci/devices/0000:01:00.0/boot_vga

Is boot vga? no

Error: can't access /sys/bus/pci/devices/0000:01:00.0/driver

The device is not bound to any driver.

can't open /sys/bus/pci/devices/0000:01:00.0/boot_vga

can't access /etc/u-d-c-nvidia-runtimepm-override file

can't open /sys/module/nvidia/version

Warning: cannot check the NVIDIA driver major version

Support for runtimepm not detected.

You can override this check at your own risk by creating the /etc/u-d-c-nvidia-runtimepm-override file.

Is nvidia runtime pm supported for "0x139a"? no

Checking power status in /proc/driver/nvidia/gpus/0000:01:00.0/power

Error while opening /proc/driver/nvidia/gpus/0000:01:00.0/power

Is nvidia runtime pm enabled for "0x139a"? no

Skipping "/dev/dri/card0", driven by "i915"

Skipping "/dev/dri/card0", driven by "i915"

Skipping "/dev/dri/card0", driven by "i915"

Found "/dev/dri/card0", driven by "i915"

output 0:

card0-eDP-1

Number of connected outputs for /dev/dri/card0: 1

Does it require offloading? no

last cards number = 2

Has amd? no

Has intel? yes

Has nvidia? yes

How many cards? 2

Has the system changed? No

Intel IGP detected

Desktop system detected

or laptop with open drivers

Nothing to do

GPU管理员一通报告,我就注意到了一条可能有用的信息:

Is nvidia blacklisted? yes

屏蔽啦?屏蔽这活应该是modprobe干的,那就去检查下modprobe吧:

$ ls /lib/modprobe.d/

aliases.conf

blacklist_linux_4.15.0-137-generic.conf

blacklist_linux_4.15.0-141-generic.conf

blacklist-nvidia.conf

fbdev-blacklist.conf

nvidia-graphics-drivers.conf

systemd.conf

看到blacklist-nvidia.conf文件了吧,人赃俱获还真是modprobe干的.就这么顺利吗?现实情况是我第一次检查的/etc/modprobe.d文件夹,没有发现可疑文件,就放过modprobe了.好一通搜索无果后才有找到另一个巢穴/lib/modprobe.d文件夹,哎呦这小子啥时候还狡兔三窟了.

费了这么大劲找到了屏蔽nvidia gpu的配置文件,不得拉出来示个众:

cat /lib/modprobe.d/blacklist-nvidia.conf

# Do not modify

# This file was generated by nvidia-prime

blacklist nvidia

blacklist nvidia-drm

blacklist nvidia-modeset

alias nvidia off

alias nvidia-drm off

alias nvidia-modeset off

从注释信息看,这文件是nvidia-prime生成了,还真是干了事后留签名,敢干敢当.

删了吧:

rm blacklist-nvidia.conf  

这里注意只删blacklist-nvidia.conf文件就可以了,不要把nvidia-graphics-drivers.conf文件也删了,虽然名字里都带nvidia.

安全期间再执行下:

sudo update-initramfs  -u

重启.

这下终于可以成功召唤出这几年随着机器学习声名鹊起的NVIDIA了:

tianlang@tianlang:spark$ nvidia-smi

Sat Mar 27 07:27:19 2021

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 460.67 Driver Version: 460.67 CUDA Version: 11.2 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

| | | MIG M. |

|===============================+======================+======================|

| 0 GeForce GTX 950M Off | 00000000:01:00.0 Off | N/A |

| N/A 49C P0 N/A / N/A | 0MiB / 2004MiB | 0% Default |

| | | N/A |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: |

| GPU GI CI PID Type Process name GPU Memory |

| ID ID Usage |

|=============================================================================|

| No running processes found |

+-----------------------------------------------------------------------------+


上一篇:如何知道 window 的 load 事件已经触发


下一篇:如何知道 window 的 load 事件已经触发