NVIDIA/故障排除

来自 Arch Linux 中文维基
(重定向自NVIDIA/Troubleshooting

显示故障(出现六个小屏幕的问题)[编辑 | 编辑源代码]

对于一些用户,使用 GeForce GT 100M 时,在 X 启动后屏幕显示会出现故障。显示了 6 个 分辨率限制在 640x480 的小屏幕。 Quadro 2000 和高分辨率显示器最近也出现了同样的问题。

要解决此问题,请在 Device 节中启用验证模式 NoTotalSizeCheck

Section "Device"
 ...
 Option "ModeValidation" "NoTotalSizeCheck"
 ...
EndSection

'/dev/nvidia0' input/output error[编辑 | 编辑源代码]

本文或本章节的事实准确性存在争议。

原因: Verify that the BIOS related suggestions work and are not coincidentally set while troubleshooting.(在 Talk:NVIDIA/故障排除#'/dev/nvidia0' Input/Output error... suggested fixes 中讨论)

出现此错误的原因可能多种多样,针对此错误给出的最常见解决方案是检查组 / 文件权限,但这在几乎所有情况下都不是问题所在。NVIDIA 文档没有详细说明如何纠正此问题,但有一些方法对某些人有效。问题可能出在与其他设备的 IRQ 冲突、内核或 BIOS 的错误路由等。

首先要尝试的是移除其他视频设备,比如采集卡,看看问题是否会消失。如果在同一个系统上有太多的视频处理器,它可能导致内核无法启动它们,因为视频控制器会有内存分配问题。特别是在显存较小的的系统上,即使只有一个视频处理器,也可能发生这种情况。在这种情况下,您应该找出系统的视频内存量(例如,通过使用lspci -v 命令),并将分配参数传递给内核。例如,对于 32 位内核,您可以设置:

vmalloc=384M

如果运行 64 位内核,驱动程序缺陷可能导致 NVIDIA 模块在 IOMMU 打开时无法初始化。在 BIOS 中关闭它已被确认对一些用户有效。 [1]User:Clickthem#nvidia module

另一件要尝试的事情是将 BIOS IRQ 路由从 Operating system controlled 更改为 BIOS controlled 或其他方式。前者可以通过使用内核参数来设置:

PCI=biosirq

noacpi 内核参数也是解决方案之一,但是因为它会完全禁用 ACPI,所以应该谨慎使用。有些硬件很容易因过热而损坏。

注意: 内核参数可以通过内核命令行或引导加载程序配置文件传递。有关详细信息,请参阅您使用的 bootloader 的 Wiki 页面。

常见崩溃排障[编辑 | 编辑源代码]

  • 尝试在 xorg.conf 中禁用 RenderAccel
  • 如果 Xorg 输出关于 "conflicting memory type""failed to allocate primary buffer: out of memory" 的错误,或者在使用 nvidia-96xx 驱动程序时出现“Signal 11”错误并崩溃,请将 nopat 添加到 内核参数 中。
  • 如果 NVIDIA 编译器提示当前 GCC 版本与编译内核时使用的版本不一致,请把以下内容添加到 /etc/profile 中:
export IGNORE_CC_MISMATCH=1
  • 如果全屏应用程序冻结或崩溃,请尝试在桌面环境的设置中启用 Display CompositingDirect fullscreen rendering 选项。

驱动升级后性能不佳[编辑 | 编辑源代码]

如果新驱动的 FPS 比旧驱动低,检查直接渲染是否已经启动。(glxinfo 程序包含在 mesa-utils 软件包中):

$ glxinfo | grep direct

如果命令输出 :

direct rendering: No

您可能需要降级驱动并重启。

避免屏幕撕裂[编辑 | 编辑源代码]

注意: 据报道,这会降低某些 OpenGL 应用程序的性能,并可能在 WebGL 中产生问题。它还大大增加了加载驱动的耗时 (NVIDIA Support Thread).

无论您使用的是哪种合成器,都可以通过强制使用完整的合成管线来避免撕裂。要测试此选项是否有效,请运行:

$ nvidia-settings --assign CurrentMetaMode="nvidia-auto-select +0+0 { ForceFullCompositionPipeline = On }"

或者单击 X Server Display Configuration 菜单选项中的 Advanced 按钮。选择 Force Composition PipelineForce Full Composition Pipeline,然后单击 Apply

为了使这一设置持久化,必须将其添加到 Xorg 配置文件的 "Screen" 部分。进行此更改时,应在驱动程序配置中启用 TripleBuffering,并禁用 AllowIndirectGLXProtocol。请参阅以下配置示例:

/etc/X11/xorg.conf.d/20-nvidia.conf
Section "Device"
        Identifier "NVIDIA Card"
        Driver     "nvidia"
        VendorName "NVIDIA Corporation"
        BoardName  "GeForce GTX 1050 Ti"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    Option         "ForceFullCompositionPipeline" "on"
    Option         "AllowIndirectGLXProtocol" "off"
    Option         "TripleBuffer" "on"
EndSection

如果没有 Xorg 配置文件,可以使用 nvidia-xconfig ( 参见 NVIDIA#Automatic configuration) 为当前硬件创建一个 Xorg 配置文件,并将其从 /etc/X11/xorg.conf 移动到首选位置 /etc/X11/xorg.conf.d/20-nvidia.conf

注意: 使用 nvidia-xconfig 生成的 20-nvidia.conf 文件中的许多配置选项都是由驱动程序自动设置的,实际并不需要。我们只需要其中的 "Screen" 部分就可以启用合成管线,该部分包含 IdentifierOption 等设置,而其他部分可以从该文件中删除。

多显示器[编辑 | 编辑源代码]

对于多显示器设置,您需要为每个显示器指定 ForceCompositionPipeline=On。例如 :

$ nvidia-settings --assign CurrentMetaMode="DP-2: nvidia-auto-select +0+0 {ForceCompositionPipeline=On}, DP-4: nvidia-auto-select +3840+0 {ForceCompositionPipeline=On}"

如果不执行此操作,nvidia-settings 命令将禁用其他显示器。

下面的命令可以用来获取当前的屏幕名称和偏移量:

$ nvidia-settings --query CurrentMetaMode

上面的命令适用于将两个 3840x2160 的显示器连接到 DP-2 和 DP-4 上。您需要通过导出 xorg.conf 来读取正确的 CurrentMetaMode,并将 ForceCompositionPipeline 附加到每个显示器上。设置 ForceCompositionPipeline 只会影响目标显示器。

提示:如果使用驱动程序启用了 vsync,则多显示器配置中使用的不同型号的显示器的刷新率可能略有不同。它将同步到其中一个刷新率,这可能导致不正确同步的显示器上出现屏幕撕裂的现象。选择同步主要使用的显示设备,因为其他显示设备不会同步正确。这可以在 ~/.nvidia-settings-rc 中配置,例如0/XVideoSyncToDisplayID=,或者安装 nvidia-settings 并使用图形配置选项。

Modprobe Error: "Could not insert 'nvidia': No such device" on linux >=4.8[编辑 | 编辑源代码]

当试图使用独立显卡时,在 linux 4.8 系统中可能会遇到如下错误:

$ modprobe nvidia -vv
modprobe: INFO: custom logging function 0x409c10 registered
modprobe: INFO: Failed to insert module '/lib/modules/4.8.6-1-ARCH/extramodules/nvidia.ko.gz': No such device
modprobe: ERROR: could not insert 'nvidia': No such device
modprobe: INFO: context 0x24481e0 released
insmod /lib/modules/4.8.6-1-ARCH/extramodules/nvidia.ko.gz 
# dmesg
...
NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:139b)
NVRM: installed in this system is not supported by the 370.28
NVRM: NVIDIA Linux driver release.  Please see 'Appendix
NVRM: A - Supported NVIDIA GPU Products' in this release's
NVRM: README, available on the Linux driver download page
NVRM: at www.nvidia.com.
...

这个问题是由 Linux 内核中有关 PCIe 电源管理的错误提交导致的(如在 NVIDIA DevTalk 讨论串 中所述)。

解决方法是在 内核参数 中添加 pcie_port_pm=off。请注意,这会禁用所有设备的 PCIe 电源管理。

挂起或休眠后的屏幕损坏[编辑 | 编辑源代码]

请参阅 NVIDIA/Tips and tricks#Preserve video memory after suspend

当使用 GDM 显示管理器时,驱动程序版本 515.43.04 以后的挂起后的损坏 bug 被修复了 [2]

使用 400 系显卡时 CPU 间歇性出现峰值[编辑 | 编辑源代码]

如果使用 400 系列显卡时出现间歇性 CPU 峰值,则可能是 PowerMizer 不断更改 GPU 的时钟频率导致的。您可以通过把以下内容添加到 Xorg 配置的 Device 部分来将 PowerMizer 的设置从自适应切换为性能:

 Option "RegistryDwords" "PowerMizerEnable=0x1; PerfLevelSrc=0x3322; PowerMizerDefaultAC=0x1"

笔记本电脑的 X 在登入和注销时挂起[编辑 | 编辑源代码]

如果在使用传统 NVIDIA 驱动程序时,Xorg 在登入和注销时候挂起(常表现为屏显被分成黑白 / 灰色两部分),但仍然可以通过 Ctrl+Alt+Backspace(或者绑定的其他“kill X”键)登录的话,请尝试在 /etc/modprobe.d/modprobe.conf 中添加:

options nvidia NVreg_Mobile=1

有的用户报告说以下配置也有效,但经过测试它也可能导致显著的性能下降:

options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=33 NVreg_DeviceFileMode=0660 NVreg_SoftEDIDs=0 NVreg_Mobile=1

请注意 NVreg_Mobile 参数的值因笔记本厂商差异而有所不同:

  • 1 - Dell 笔记本电脑
  • 2 - 非 Compal 的 Toshiba 笔记本电脑
  • 3 - 其他笔记本电脑
  • 4 - Compal Toshiba 笔记本电脑.
  • 5 - Gateway 笔记本电脑.

请参考 NVIDIA Driver's README: Appendix K 了解更多信息。

Screen(s) found, but none have a usable configuration[编辑 | 编辑源代码]

Sometimes NVIDIA and X have trouble finding the active screen. If your graphics card has multiple outputs try plugging your monitor into the other ones. On a laptop it may be because your graphics card has VGA/TV out. Xorg.0.log will provide more info.

Another thing to try is adding invalid "ConnectedMonitor" Option to Section "Device" to force Xorg throws error and shows you how correct it. Here more about ConnectedMonitor setting.

After re-run X see Xorg.0.log to get valid CRT-x,DFP-x,TV-x values.

nvidia-xconfig --query-gpu-info could be helpful.

Blackscreen at X startup / Machine poweroff at X shutdown[编辑 | 编辑源代码]

If you have installed an update of NVIDIA and your screen stays black after launching Xorg, or if shutting down Xorg causes a machine poweroff, try the below workarounds:

  • Prepend "xrandr --auto" to your xinitrc
  • You can also try to add the nvidia module directly to your mkinitcpio.conf.
# modprobe nvidia

Backlight is not turning off in some occasions[编辑 | 编辑源代码]

By default, DPMS should turn off backlight with the timeouts set or by running xset. However, probably due to a bug in the proprietary NVIDIA drivers the result is a blank screen with no powersaving whatsoever. To workaround it, until the bug has been fixed you can use the vbetool as root.

Install the vbetool package.

Turn off your screen on demand and then by pressing a random key backlight turns on again:

vbetool dpms off && read -n1; vbetool dpms on

Alternatively, xrandr is able to disable and re-enable monitor outputs without requiring root.

xrandr --output DP-1 --off; read -n1; xrandr --output DP-1 --auto

Driver 415: HardDPMS[编辑 | 编辑源代码]

这篇文章的某些内容需要扩充。

原因: Add references for the "user reports". (在 Talk:NVIDIA/故障排除 中讨论)

Proprietary driver 415 includes a new feature called HardDPMS. This is reported by some users to solve the issues with suspending monitors connected over DisplayPort. It is reported to become the default in a future driver version, but for now, the HardDPMS option can be set in the Device or Screen sections. For example:

/etc/X11/xorg.conf.d/20-nvidia.conf
Section "Device"
    ...
    Option         "HardDPMS" "true"    
    ...
EndSection

Section "Screen"
    ...
    Option         "HardDPMS" "true"
    ...
EndSection

HardDPMS will trigger on screensaver settings like BlankTime. The following ServerFlags will set your monitor(s) to suspend after 10 minutes of inactivity:

/etc/X11/xorg.conf.d/20-nvidia.conf
Section "ServerFlags"
    Option     "BlankTime" "10"
EndSection

Xorg fails to load or Red Screen of Death[编辑 | 编辑源代码]

If you get a red screen and use GRUB, disable the GRUB framebuffer by editing /etc/default/grub and uncomment GRUB_TERMINAL_OUTPUT=console. For more information see GRUB/Tips and tricks#Disable framebuffer.

Black screen on systems with integrated GPU[编辑 | 编辑源代码]

If you have a system with an integrated GPU (e.g. Intel HD 4000, VIA VX820 Chrome 9 or AMD Cezanne) and have installed the nvidia package, you may experience a black screen on boot, when changing virtual terminal, or when exiting an X session. This may be caused by a conflict between the graphics modules. This is solved by blacklisting the relevant GPU modules. Create the file /etc/modprobe.d/blacklist.conf and prevent the relevant modules from loading on boot:

/etc/modprobe.d/blacklist.conf
install i915 /usr/bin/false
install intel_agp /usr/bin/false
install viafb /usr/bin/false
install radeon /usr/bin/false
install amdgpu /usr/bin/false

No audio over HDMI[编辑 | 编辑源代码]

Sometimes NVIDIA HDMI audio devices are not shown when you do

$ aplay -l

On some new machines, the audio chip on the NVIDIA GPU is disabled at boot. Read more on NVIDIA's website and a forum post.

You need to reload the NVIDIA device with audio enabled. In order to do that make sure that your GPU is on (in case of laptops/Bumblebee) and that you are not running X on it, because it is going to reset:

# setpci -s 01:00.0 0x488.l=0x2000000:0x2000000
# rmmod nvidia-drm nvidia-modeset nvidia
# echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
# echo 1 > /sys/bus/pci/devices/0000:00:01.0/rescan
# modprobe nvidia-drm
# xinit -- -retro

If you are running your TTY on NVIDIA, put the lines in a script so you do not end up with no screen.

X fails with "no screens found" when using Multiple GPUs[编辑 | 编辑源代码]

In situations where you might have multiple GPUs on a system and X fails to start with:

[ 76.633] (EE) No devices detected.
[ 76.633] Fatal server error:
[ 76.633] no screens found

then you need to add your discrete card's BusID to your X configuration. This can happen on systems with an Intel CPU and an integrated GPU or if you have more than one NVIDIA card connected. Find your BusID:

# lspci | grep -E "VGA|3D controller"
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [GeForce GTX 650] (rev a1)
08:00.0 3D controller: NVIDIA Corporation GM108GLM [Quadro K620M / Quadro M500M] (rev a2)

Then you fix it by adding it to the card's Device section in your X configuration. In my case:

/etc/X11/xorg.conf.d/10-nvidia.conf
Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:1:0:0"
EndSection
注意: BusID formatting is important!

In the example above 01:00.0 is stripped to be written as 1:0:0, however some conversions can be more complicated. lspci output is in hex format, but in configuration files the BusID's are in decimal format! This means that in cases where the BusID is greater than 9 you will need to convert it to decimal!

ie: 5e:00.0 from lspci becomes PCI:94:0:0.

Xorg fails during boot, but otherwise starts fine[编辑 | 编辑源代码]

On very fast booting systems, systemd may attempt to start the display manager before the NVIDIA driver has fully initialized. You will see a message like the following in your logs only when Xorg runs during boot.

/var/log/Xorg.0.log
[     1.807] (EE) NVIDIA(0): Failed to initialize the NVIDIA kernel module. Please see the
[     1.807] (EE) NVIDIA(0):     system's kernel log for additional error messages and
[     1.808] (EE) NVIDIA(0):     consult the NVIDIA README for details.
[     1.808] (EE) NVIDIA(0):  *** Aborting ***

In this case you will need to establish an ordering dependency from the display manager to the DRI device. First create device units for DRI devices by creating a new udev rules file.

/etc/udev/rules.d/99-systemd-dri-devices.rules
ACTION=="add", KERNEL=="card*", SUBSYSTEM=="drm", TAG+="systemd"

Then create dependencies from the display manager to the device(s).

/etc/systemd/system/display-manager.service.d/10-wait-for-dri-devices.conf
[Unit]
Wants=dev-dri-card0.device
After=dev-dri-card0.device

If you have additional cards needed for the desktop then list them in Wants and After seperated by spaces.

xrandr BadMatch[编辑 | 编辑源代码]

If you are trying to configure a WQHD monitor such as DELL U2515H using xrandr and xrandr --addmode gives you the error X Error of failed request: BadMatch, it might be because the proprietary NVIDIA driver clips the pixel clock maximum frequency of HDMI output to 225 MHz or lower. To set the monitor to maximum resolution you have to install nouveau drivers. You can force nouveau to use a specific pixel clock frequency by setting nouveau.hdmimhz=297 (or 330) in your Kernel parameters.

Alternatively, it may be that your monitor's EDID is incorrect. See #Override EDID.

Another reason could be that by default current NVIDIA drivers will only allow modes explicitly reported by EDID, but sometimes refresh rates and/or resolutions are desired which are not reported by the monitor (although the EDID information is correct; it is just that current NVIDIA drivers are too restrictive).

If this happens, you may want to add an option to xorg.conf to allow non-EDID modes:

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
...
    Option         "ModeValidation" "AllowNonEdidModes"
...
EndSection

This can be set per-output. See NVidia driver readme (Appendix B. X Config Options) for more information.

Override EDID[编辑 | 编辑源代码]

See Kernel mode setting#Forcing modes and EDID, Xrandr#Troubleshooting and Qnix QX2710#Fixing X11 with Nvidia.

Overclocking with nvidia-settings GUI not working[编辑 | 编辑源代码]

本文或本章节的语言、语法或风格需要改进。参考:Help:Style

原因:Duplication, vague "not working"(在Talk:NVIDIA/故障排除讨论)

Workaround is to use nvidia-settings CLI to query and set certain variables after enabling overclocking (as explained in NVIDIA/Tips and tricks#Enabling overclocking, see nvidia-settings(1) for more information).

Example to query all variables:

 nvidia-settings -q all

Example to set PowerMizerMode to prefer performance mode:

 nvidia-settings -a [gpu:0]/GPUPowerMizerMode=1

Example to set fan speed to fixed 21%:

nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=21

Example to set multiple variables at once (overclock GPU by 50MHz, overclock video memory by 50MHz, increase GPU voltage by 100mV):

 nvidia-settings -a GPUGraphicsClockOffsetAllPerformanceLevels=50 -a GPUMemoryTransferRateOffsetGPUGraphicsClockOffsetAllPerformanceLevels=50 -a GPUOverVoltageOffset=100

Overclocking not working with Unknown Error[编辑 | 编辑源代码]

If you are running Xorg as a non-root user and trying to overclock your NVIDIA GPU, you will get an error similar to this one:

$ nvidia-settings -a "[gpu:0]/GPUGraphicsClockOffset[3]=10"
ERROR: Error assigning value 10 to attribute 'GPUGraphicsClockOffset' (trinity-zero:1[gpu:0]) as specified in assignment
        '[gpu:0]/GPUGraphicsClockOffset[3]=10' (Unknown Error).

To avoid this issue, Xorg has to be run as the root user. See Xorg#Rootless Xorg for details.

System will not boot after driver was installed[编辑 | 编辑源代码]

If after installing the NVIDIA driver your system becomes stuck before reaching the display manager, try to disable kernel mode setting.

X fails with "Failing initialization of X screen"[编辑 | 编辑源代码]

If /var/log/Xorg.0.log says X server fails to initialize screen

(EE) NVIDIA(G0): GPU screens are not yet supported by the NVIDIA driver
(EE) NVIDIA(G0): Failing initialization of X screen

and nvidia-smi says No running processes found

The solution is at first reinstall latest nvidia-utils, and then copy /usr/share/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf to /etc/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf, and then edit /etc/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf and add the line Option "PrimaryGPU" "yes". Restart the computer. The problem will be fixed.

System does not return from suspend[编辑 | 编辑源代码]

What you see in the log:

kernel: nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices
kernel: nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices
kernel: nvidia-modeset: WARNING: GPU:0: Failure processing EDID for display device DELL U2412M (DP-0).
kernel: nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DELL U2412M (DP-0)
kernel: nvidia-modeset: ERROR: GPU:0: Failure reading maximum pixel clock value for display device DELL U2412M (DP-0).

A possible solution based on [3]:

Run this command to get the version string:

# strings /sys/firmware/acpi/tables/DSDT | grep -i 'windows ' | sort | tail -1

Add the acpi_osi=! "acpi_osi=version" kernel parameter to your boot loader configuration.

Vulkan error on applications start[编辑 | 编辑源代码]

本文或本章节的事实准确性存在争议。

原因: Need confirmation by other users(在 Talk:NVIDIA/故障排除 中讨论)


On executing an application that require Vulkan acceleration, if you get this error

Vulkan call failed: -4

try to delete the ~/.nv or ~/.cache/nvidia directory.

Extreme lag on Xorg[编辑 | 编辑源代码]

本文或本章节的事实准确性存在争议。

原因: According to an NVIDIA developer this issue is not specific to GNOME and the rest of the comments on the issue do not mention multi-monitor setups.(在 Talk:NVIDIA/故障排除 中讨论)


A common issue with Mutter is that animations, video playback and gaming cause extreme desktop lag on Xorg.

See NVIDIA/Tips and tricks#Preserve video memory after suspend.

This should resolve this issue, however if it did not, you are most likely out of luck. One way you can remedy this issue is by adding these options:

/etc/environment
CLUTTER_DEFAULT_FPS=YOUR_MAIN_DISPLAY_REFRESHRATE
__GL_SYNC_DISPLAY_DEVICE=YOUR_MAIN_DISPLAY_OUTPUT_NAME

turning Sync to VBlank and Allow flipping off within NVIDIA Settings, and configuring NVIDIA Settings to launch on startup using the flag --load-config-only. This will still result in a laggy desktop behavior, in particular on an eventual second (or third) monitor, but it should be much better.