NVIDIA/故障排除

出自 Arch Linux 中文维基

顯示故障(出現六個小屏幕的問題)[編輯 | 編輯原始碼]

對於一些用戶,使用 GeForce GT 100M 時,在 X 啟動後屏幕顯示會出現故障。顯示了 6 個 解像度限制在 640x480 的小屏幕。 Quadro 2000 和高解像度顯示器最近也出現了同樣的問題。

要解決此問題,請在 Device 節中啟用驗證模式 NoTotalSizeCheck

Section "Device"
 ...
 Option "ModeValidation" "NoTotalSizeCheck"
 ...
EndSection

'/dev/nvidia0' input/output error[編輯 | 編輯原始碼]

本文或本章節的事實準確性存在爭議。

原因: Verify that the BIOS related suggestions work and are not coincidentally set while troubleshooting.(在 Talk:NVIDIA/故障排除#'/dev/nvidia0' Input/Output error... suggested fixes 中討論)

出現此錯誤的原因可能多種多樣,針對此錯誤給出的最常見解決方案是檢查組 / 文件權限,但這在幾乎所有情況下都不是問題所在。NVIDIA 文檔沒有詳細說明如何糾正此問題,但有一些方法對某些人有效。問題可能出在與其他設備的 IRQ 衝突、內核或 BIOS 的錯誤路由等。

首先要嘗試的是移除其他視頻設備,比如採集卡,看看問題是否會消失。如果在同一個系統上有太多的視頻處理器,它可能導致內核無法啟動它們,因為視頻控制器會有內存分配問題。特別是在顯存較小的的系統上,即使只有一個視頻處理器,也可能發生這種情況。在這種情況下,您應該找出系統的視頻內存量(例如,通過使用lspci -v 命令),並將分配參數傳遞給內核。例如,對於 32 位內核,您可以設置:

vmalloc=384M

如果運行 64 位內核,驅動程序缺陷可能導致 NVIDIA 模塊在 IOMMU 打開時無法初始化。在 BIOS 中關閉它已被確認對一些用戶有效。 [1]User:Clickthem#nvidia module

另一件要嘗試的事情是將 BIOS IRQ 路由從 Operating system controlled 更改為 BIOS controlled 或其他方式。前者可以通過使用內核參數來設置:

PCI=biosirq

noacpi 內核參數也是解決方案之一,但是因為它會完全禁用 ACPI,所以應該謹慎使用。有些硬件很容易因過熱而損壞。

注意: 內核參數可以通過內核命令行或引導加載程序配置文件傳遞。有關詳細信息,請參閱您使用的 bootloader 的 Wiki 頁面。

常見崩潰排障[編輯 | 編輯原始碼]

  • 嘗試在 xorg.conf 中禁用 RenderAccel
  • 如果 Xorg 輸出關於 "conflicting memory type""failed to allocate primary buffer: out of memory" 的錯誤,或者在使用 nvidia-96xx 驅動程序時出現「Signal 11」錯誤並崩潰,請將 nopat 添加到 內核參數 中。
  • 如果 NVIDIA 編譯器提示當前 GCC 版本與編譯內核時使用的版本不一致,請把以下內容添加到 /etc/profile 中:
export IGNORE_CC_MISMATCH=1
  • 如果全屏應用程式凍結或崩潰,請嘗試在桌面環境的設置中啟用 Display CompositingDirect fullscreen rendering 選項。

驅動升級後性能不佳[編輯 | 編輯原始碼]

如果新驅動的 FPS 比舊驅動低,檢查直接渲染是否已經啟動。(glxinfo 程序包含在 mesa-utils 軟件包中):

$ glxinfo | grep direct

如果命令輸出 :

direct rendering: No

您可能需要降級驅動並重啟。

避免屏幕撕裂[編輯 | 編輯原始碼]

注意: 據報道,這會降低某些 OpenGL 應用程式的性能,並可能在 WebGL 中產生問題。它還大大增加了加載驅動的耗時 (NVIDIA Support Thread).

無論您使用的是哪種合成器,都可以通過強制使用完整的合成管線來避免撕裂。要測試此選項是否有效,請運行:

$ nvidia-settings --assign CurrentMetaMode="nvidia-auto-select +0+0 { ForceFullCompositionPipeline = On }"

或者單擊 X Server Display Configuration 菜單選項中的 Advanced 按鈕。選擇 Force Composition PipelineForce Full Composition Pipeline,然後單擊 Apply

為了使這一設置持久化,必須將其添加到 Xorg 配置文件的 "Screen" 部分。進行此更改時,應在驅動程序配置中啟用 TripleBuffering,並禁用 AllowIndirectGLXProtocol。請參閱以下配置示例:

/etc/X11/xorg.conf.d/20-nvidia.conf
Section "Device"
        Identifier "NVIDIA Card"
        Driver     "nvidia"
        VendorName "NVIDIA Corporation"
        BoardName  "GeForce GTX 1050 Ti"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    Option         "ForceFullCompositionPipeline" "on"
    Option         "AllowIndirectGLXProtocol" "off"
    Option         "TripleBuffer" "on"
EndSection

如果沒有 Xorg 配置文件,可以使用 nvidia-xconfig ( 參見 NVIDIA#Automatic configuration) 為當前硬件創建一個 Xorg 配置文件,並將其從 /etc/X11/xorg.conf 移動到首選位置 /etc/X11/xorg.conf.d/20-nvidia.conf

注意: 使用 nvidia-xconfig 生成的 20-nvidia.conf 文件中的許多配置選項都是由驅動程序自動設置的,實際並不需要。我們只需要其中的 "Screen" 部分就可以啟用合成管線,該部分包含 IdentifierOption 等設置,而其他部分可以從該文件中刪除。

多顯示器[編輯 | 編輯原始碼]

對於多顯示器設置,您需要為每個顯示器指定 ForceCompositionPipeline=On。例如 :

$ nvidia-settings --assign CurrentMetaMode="DP-2: nvidia-auto-select +0+0 {ForceCompositionPipeline=On}, DP-4: nvidia-auto-select +3840+0 {ForceCompositionPipeline=On}"

如果不執行此操作,nvidia-settings 命令將禁用其他顯示器。

下面的命令可以用來獲取當前的屏幕名稱和偏移量:

$ nvidia-settings --query CurrentMetaMode

上面的命令適用於將兩個 3840x2160 的顯示器連接到 DP-2 和 DP-4 上。您需要通過導出 xorg.conf 來讀取正確的 CurrentMetaMode,並將 ForceCompositionPipeline 附加到每個顯示器上。設置 ForceCompositionPipeline 只會影響目標顯示器。

提示:如果使用驅動程序啟用了 vsync,則多顯示器配置中使用的不同型號的顯示器的刷新率可能略有不同。它將同步到其中一個刷新率,這可能導致不正確同步的顯示器上出現屏幕撕裂的現象。選擇同步主要使用的顯示設備,因為其他顯示設備不會同步正確。這可以在 ~/.nvidia-settings-rc 中配置,例如0/XVideoSyncToDisplayID=,或者安裝 nvidia-settings 並使用圖形配置選項。

Modprobe Error: "Could not insert 'nvidia': No such device" on linux >=4.8[編輯 | 編輯原始碼]

當試圖使用獨立顯卡時,在 linux 4.8 系統中可能會遇到如下錯誤:

$ modprobe nvidia -vv
modprobe: INFO: custom logging function 0x409c10 registered
modprobe: INFO: Failed to insert module '/lib/modules/4.8.6-1-ARCH/extramodules/nvidia.ko.gz': No such device
modprobe: ERROR: could not insert 'nvidia': No such device
modprobe: INFO: context 0x24481e0 released
insmod /lib/modules/4.8.6-1-ARCH/extramodules/nvidia.ko.gz 
# dmesg
...
NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:139b)
NVRM: installed in this system is not supported by the 370.28
NVRM: NVIDIA Linux driver release.  Please see 'Appendix
NVRM: A - Supported NVIDIA GPU Products' in this release's
NVRM: README, available on the Linux driver download page
NVRM: at www.nvidia.com.
...

這個問題是由 Linux 內核中有關 PCIe 電源管理的錯誤提交導致的(如在 NVIDIA DevTalk 討論串 中所述)。

解決方法是在 內核參數 中添加 pcie_port_pm=off。請注意,這會禁用所有設備的 PCIe 電源管理。

掛起或休眠後的屏幕損壞[編輯 | 編輯原始碼]

請參閱 NVIDIA/Tips and tricks#Preserve video memory after suspend

當使用 GDM 顯示管理器時,驅動程序版本 515.43.04 以後的掛起後的損壞 bug 被修復了 [2]

使用 400 系顯卡時 CPU 間歇性出現峰值[編輯 | 編輯原始碼]

如果使用 400 系列顯卡時出現間歇性 CPU 峰值,則可能是 PowerMizer 不斷更改 GPU 的時鐘頻率導致的。您可以通過把以下內容添加到 Xorg 配置的 Device 部分來將 PowerMizer 的設置從自適應切換為性能:

 Option "RegistryDwords" "PowerMizerEnable=0x1; PerfLevelSrc=0x3322; PowerMizerDefaultAC=0x1"

筆記本電腦的 X 在登入和註銷時掛起[編輯 | 編輯原始碼]

如果在使用傳統 NVIDIA 驅動程序時,Xorg 在登入和註銷時候掛起(常表現為屏顯被分成黑白 / 灰色兩部分),但仍然可以通過 Ctrl+Alt+Backspace(或者綁定的其他「kill X」鍵)登錄的話,請嘗試在 /etc/modprobe.d/modprobe.conf 中添加:

options nvidia NVreg_Mobile=1

有的用戶報告說以下配置也有效,但經過測試它也可能導致顯著的性能下降:

options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=33 NVreg_DeviceFileMode=0660 NVreg_SoftEDIDs=0 NVreg_Mobile=1

請注意 NVreg_Mobile 參數的值因筆記本廠商差異而有所不同:

  • 1 - Dell 筆記本電腦
  • 2 - 非 Compal 的 Toshiba 筆記本電腦
  • 3 - 其他筆記本電腦
  • 4 - Compal Toshiba 筆記本電腦.
  • 5 - Gateway 筆記本電腦.

請參考 NVIDIA Driver's README: Appendix K 了解更多信息。

Screen(s) found, but none have a usable configuration[編輯 | 編輯原始碼]

Sometimes NVIDIA and X have trouble finding the active screen. If your graphics card has multiple outputs try plugging your monitor into the other ones. On a laptop it may be because your graphics card has VGA/TV out. Xorg.0.log will provide more info.

Another thing to try is adding invalid "ConnectedMonitor" Option to Section "Device" to force Xorg throws error and shows you how correct it. Here more about ConnectedMonitor setting.

After re-run X see Xorg.0.log to get valid CRT-x,DFP-x,TV-x values.

nvidia-xconfig --query-gpu-info could be helpful.

Blackscreen at X startup / Machine poweroff at X shutdown[編輯 | 編輯原始碼]

If you have installed an update of NVIDIA and your screen stays black after launching Xorg, or if shutting down Xorg causes a machine poweroff, try the below workarounds:

  • Prepend "xrandr --auto" to your xinitrc
  • You can also try to add the nvidia module directly to your mkinitcpio.conf.
# modprobe nvidia

Backlight is not turning off in some occasions[編輯 | 編輯原始碼]

By default, DPMS should turn off backlight with the timeouts set or by running xset. However, probably due to a bug in the proprietary NVIDIA drivers the result is a blank screen with no powersaving whatsoever. To workaround it, until the bug has been fixed you can use the vbetool as root.

Install the vbetool package.

Turn off your screen on demand and then by pressing a random key backlight turns on again:

vbetool dpms off && read -n1; vbetool dpms on

Alternatively, xrandr is able to disable and re-enable monitor outputs without requiring root.

xrandr --output DP-1 --off; read -n1; xrandr --output DP-1 --auto

Driver 415: HardDPMS[編輯 | 編輯原始碼]

這篇文章的某些內容需要擴充。

原因: Add references for the "user reports". (在 Talk:NVIDIA/故障排除 中討論)

Proprietary driver 415 includes a new feature called HardDPMS. This is reported by some users to solve the issues with suspending monitors connected over DisplayPort. It is reported to become the default in a future driver version, but for now, the HardDPMS option can be set in the Device or Screen sections. For example:

/etc/X11/xorg.conf.d/20-nvidia.conf
Section "Device"
    ...
    Option         "HardDPMS" "true"    
    ...
EndSection

Section "Screen"
    ...
    Option         "HardDPMS" "true"
    ...
EndSection

HardDPMS will trigger on screensaver settings like BlankTime. The following ServerFlags will set your monitor(s) to suspend after 10 minutes of inactivity:

/etc/X11/xorg.conf.d/20-nvidia.conf
Section "ServerFlags"
    Option     "BlankTime" "10"
EndSection

Xorg fails to load or Red Screen of Death[編輯 | 編輯原始碼]

If you get a red screen and use GRUB, disable the GRUB framebuffer by editing /etc/default/grub and uncomment GRUB_TERMINAL_OUTPUT=console. For more information see GRUB/Tips and tricks#Disable framebuffer.

Black screen on systems with integrated GPU[編輯 | 編輯原始碼]

If you have a system with an integrated GPU (e.g. Intel HD 4000, VIA VX820 Chrome 9 or AMD Cezanne) and have installed the nvidia package, you may experience a black screen on boot, when changing virtual terminal, or when exiting an X session. This may be caused by a conflict between the graphics modules. This is solved by blacklisting the relevant GPU modules. Create the file /etc/modprobe.d/blacklist.conf and prevent the relevant modules from loading on boot:

/etc/modprobe.d/blacklist.conf
install i915 /usr/bin/false
install intel_agp /usr/bin/false
install viafb /usr/bin/false
install radeon /usr/bin/false
install amdgpu /usr/bin/false

No audio over HDMI[編輯 | 編輯原始碼]

Sometimes NVIDIA HDMI audio devices are not shown when you do

$ aplay -l

On some new machines, the audio chip on the NVIDIA GPU is disabled at boot. Read more on NVIDIA's website and a forum post.

You need to reload the NVIDIA device with audio enabled. In order to do that make sure that your GPU is on (in case of laptops/Bumblebee) and that you are not running X on it, because it is going to reset:

# setpci -s 01:00.0 0x488.l=0x2000000:0x2000000
# rmmod nvidia-drm nvidia-modeset nvidia
# echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
# echo 1 > /sys/bus/pci/devices/0000:00:01.0/rescan
# modprobe nvidia-drm
# xinit -- -retro

If you are running your TTY on NVIDIA, put the lines in a script so you do not end up with no screen.

X fails with "no screens found" when using Multiple GPUs[編輯 | 編輯原始碼]

In situations where you might have multiple GPUs on a system and X fails to start with:

[ 76.633] (EE) No devices detected.
[ 76.633] Fatal server error:
[ 76.633] no screens found

then you need to add your discrete card's BusID to your X configuration. This can happen on systems with an Intel CPU and an integrated GPU or if you have more than one NVIDIA card connected. Find your BusID:

# lspci | grep -E "VGA|3D controller"
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [GeForce GTX 650] (rev a1)
08:00.0 3D controller: NVIDIA Corporation GM108GLM [Quadro K620M / Quadro M500M] (rev a2)

Then you fix it by adding it to the card's Device section in your X configuration. In my case:

/etc/X11/xorg.conf.d/10-nvidia.conf
Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:1:0:0"
EndSection
注意: BusID formatting is important!

In the example above 01:00.0 is stripped to be written as 1:0:0, however some conversions can be more complicated. lspci output is in hex format, but in configuration files the BusID's are in decimal format! This means that in cases where the BusID is greater than 9 you will need to convert it to decimal!

ie: 5e:00.0 from lspci becomes PCI:94:0:0.

Xorg fails during boot, but otherwise starts fine[編輯 | 編輯原始碼]

On very fast booting systems, systemd may attempt to start the display manager before the NVIDIA driver has fully initialized. You will see a message like the following in your logs only when Xorg runs during boot.

/var/log/Xorg.0.log
[     1.807] (EE) NVIDIA(0): Failed to initialize the NVIDIA kernel module. Please see the
[     1.807] (EE) NVIDIA(0):     system's kernel log for additional error messages and
[     1.808] (EE) NVIDIA(0):     consult the NVIDIA README for details.
[     1.808] (EE) NVIDIA(0):  *** Aborting ***

In this case you will need to establish an ordering dependency from the display manager to the DRI device. First create device units for DRI devices by creating a new udev rules file.

/etc/udev/rules.d/99-systemd-dri-devices.rules
ACTION=="add", KERNEL=="card*", SUBSYSTEM=="drm", TAG+="systemd"

Then create dependencies from the display manager to the device(s).

/etc/systemd/system/display-manager.service.d/10-wait-for-dri-devices.conf
[Unit]
Wants=dev-dri-card0.device
After=dev-dri-card0.device

If you have additional cards needed for the desktop then list them in Wants and After seperated by spaces.

xrandr BadMatch[編輯 | 編輯原始碼]

If you are trying to configure a WQHD monitor such as DELL U2515H using xrandr and xrandr --addmode gives you the error X Error of failed request: BadMatch, it might be because the proprietary NVIDIA driver clips the pixel clock maximum frequency of HDMI output to 225 MHz or lower. To set the monitor to maximum resolution you have to install nouveau drivers. You can force nouveau to use a specific pixel clock frequency by setting nouveau.hdmimhz=297 (or 330) in your Kernel parameters.

Alternatively, it may be that your monitor's EDID is incorrect. See #Override EDID.

Another reason could be that by default current NVIDIA drivers will only allow modes explicitly reported by EDID, but sometimes refresh rates and/or resolutions are desired which are not reported by the monitor (although the EDID information is correct; it is just that current NVIDIA drivers are too restrictive).

If this happens, you may want to add an option to xorg.conf to allow non-EDID modes:

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
...
    Option         "ModeValidation" "AllowNonEdidModes"
...
EndSection

This can be set per-output. See NVidia driver readme (Appendix B. X Config Options) for more information.

Override EDID[編輯 | 編輯原始碼]

See Kernel mode setting#Forcing modes and EDID, Xrandr#Troubleshooting and Qnix QX2710#Fixing X11 with Nvidia.

Overclocking with nvidia-settings GUI not working[編輯 | 編輯原始碼]

本文或本章節的語言、語法或風格需要改進。參考:Help:Style

原因:Duplication, vague "not working"(在Talk:NVIDIA/故障排除討論)

Workaround is to use nvidia-settings CLI to query and set certain variables after enabling overclocking (as explained in NVIDIA/Tips and tricks#Enabling overclocking, see nvidia-settings(1) for more information).

Example to query all variables:

 nvidia-settings -q all

Example to set PowerMizerMode to prefer performance mode:

 nvidia-settings -a [gpu:0]/GPUPowerMizerMode=1

Example to set fan speed to fixed 21%:

nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=21

Example to set multiple variables at once (overclock GPU by 50MHz, overclock video memory by 50MHz, increase GPU voltage by 100mV):

 nvidia-settings -a GPUGraphicsClockOffsetAllPerformanceLevels=50 -a GPUMemoryTransferRateOffsetGPUGraphicsClockOffsetAllPerformanceLevels=50 -a GPUOverVoltageOffset=100

Overclocking not working with Unknown Error[編輯 | 編輯原始碼]

If you are running Xorg as a non-root user and trying to overclock your NVIDIA GPU, you will get an error similar to this one:

$ nvidia-settings -a "[gpu:0]/GPUGraphicsClockOffset[3]=10"
ERROR: Error assigning value 10 to attribute 'GPUGraphicsClockOffset' (trinity-zero:1[gpu:0]) as specified in assignment
        '[gpu:0]/GPUGraphicsClockOffset[3]=10' (Unknown Error).

To avoid this issue, Xorg has to be run as the root user. See Xorg#Rootless Xorg for details.

System will not boot after driver was installed[編輯 | 編輯原始碼]

If after installing the NVIDIA driver your system becomes stuck before reaching the display manager, try to disable kernel mode setting.

X fails with "Failing initialization of X screen"[編輯 | 編輯原始碼]

If /var/log/Xorg.0.log says X server fails to initialize screen

(EE) NVIDIA(G0): GPU screens are not yet supported by the NVIDIA driver
(EE) NVIDIA(G0): Failing initialization of X screen

and nvidia-smi says No running processes found

The solution is at first reinstall latest nvidia-utils, and then copy /usr/share/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf to /etc/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf, and then edit /etc/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf and add the line Option "PrimaryGPU" "yes". Restart the computer. The problem will be fixed.

System does not return from suspend[編輯 | 編輯原始碼]

What you see in the log:

kernel: nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices
kernel: nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices
kernel: nvidia-modeset: WARNING: GPU:0: Failure processing EDID for display device DELL U2412M (DP-0).
kernel: nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DELL U2412M (DP-0)
kernel: nvidia-modeset: ERROR: GPU:0: Failure reading maximum pixel clock value for display device DELL U2412M (DP-0).

A possible solution based on [3]:

Run this command to get the version string:

# strings /sys/firmware/acpi/tables/DSDT | grep -i 'windows ' | sort | tail -1

Add the acpi_osi=! "acpi_osi=version" kernel parameter to your boot loader configuration.

Vulkan error on applications start[編輯 | 編輯原始碼]

本文或本章節的事實準確性存在爭議。

原因: Need confirmation by other users(在 Talk:NVIDIA/故障排除 中討論)


On executing an application that require Vulkan acceleration, if you get this error

Vulkan call failed: -4

try to delete the ~/.nv or ~/.cache/nvidia directory.

Extreme lag on Xorg[編輯 | 編輯原始碼]

本文或本章節的事實準確性存在爭議。

原因: According to an NVIDIA developer this issue is not specific to GNOME and the rest of the comments on the issue do not mention multi-monitor setups.(在 Talk:NVIDIA/故障排除 中討論)


A common issue with Mutter is that animations, video playback and gaming cause extreme desktop lag on Xorg.

See NVIDIA/Tips and tricks#Preserve video memory after suspend.

This should resolve this issue, however if it did not, you are most likely out of luck. One way you can remedy this issue is by adding these options:

/etc/environment
CLUTTER_DEFAULT_FPS=YOUR_MAIN_DISPLAY_REFRESHRATE
__GL_SYNC_DISPLAY_DEVICE=YOUR_MAIN_DISPLAY_OUTPUT_NAME

turning Sync to VBlank and Allow flipping off within NVIDIA Settings, and configuring NVIDIA Settings to launch on startup using the flag --load-config-only. This will still result in a laggy desktop behavior, in particular on an eventual second (or third) monitor, but it should be much better.