性能优化
本文将介绍与性能有关的系统诊断知识和具体步骤,通过减少资源消耗等方式优化系统性能。游戏相关的特别优化请参阅 Gaming#Improving_performance。
基础[编辑 | 编辑源代码]
了解系统[编辑 | 编辑源代码]
性能优化的最佳方法是找到瓶颈或拖慢整体速度的子系统。查看系统细节可以帮助确定问题。
- 如果在同时运行多个大型程序时卡顿(如 LibreOffice、Firefox 等),请检查内存容量是否充足。使用以下命令,并检查“available”一列的数值:
$ free -h
- 如果电脑开机很慢,并且(仅在)第一次打开应用时加载很慢,可能是因为硬盘速度过慢。可以用
hdparm
命令测试硬盘速度:# hdparm -t /dev/sdX
- 如果使用直接渲染(GPU 渲染)的应用运行卡顿(比如使用 GPU 的视频播放器、游戏甚至窗口管理器),改善 GPU 的性能应当有所帮助。首先需要检查直接渲染是否已经开启。可以使用 mesa-demos包 中的
glxinfo
命令:$ glxinfo | grep "direct rendering"
,如果开启了,则会返回direct rendering: Yes
。
基准测试[编辑 | 编辑源代码]
为定量评估优化成果,可使用基准测试。
存储设备[编辑 | 编辑源代码]
多硬件路径[编辑 | 编辑源代码]
内部硬件路径意指储存设备是如何连接到主板的。例如 TCP/IP 经由 NIC、即插即用设备可以使用 PCIe/PCI、火线、RAID 卡 、USB 等。通过将储存设备均分到这些接口可以最大化主板的性能,比如将六个硬盘接连到 USB 要比三个连接到 USB、三个连接到火线要慢。原因是主板上的接口点类似管道,而管道同一时间的最大流量是有上限的。幸运的是主板通常会有多个管道。
此外,假设电脑前面有两个 USB 插口,后面有四个 USB 插口,那么前面插两个、后面插两个应当要比前面插一个、后面插三个更快。这是因为前面的插口可能是多个根 Hub 设备,也就是说它可以在同一时间发送更多的数据。
使用下面的命令查看机器上是否有多个路径:
USB设备树
$ lsusb -tv
PCI设备树
$ lspci -tv
分区[编辑 | 编辑源代码]
确保您已经对齐分区。
多硬盘[编辑 | 编辑源代码]
如果有多个硬盘,将其设置为软件 RAID 可以提升速度。
在分离的硬盘上创建 交换空间 也有所帮助,尤其是使用交换空间十分频繁时。
选择并调整文件系统[编辑 | 编辑源代码]
为系统选择合适的文件系统十分重要,因为不同文件系统有各自的优势。File systems 文中对主流文件系统作了简短的总结,也可以在 Category:File systems 中阅读其他相关文章。
挂载选项[编辑 | 编辑源代码]
noatime 选项可以提高文件系统的效率。其他选项和文件系统相关,请参考所使用文件系统的相关段落。
- Ext3
- Ext4#Improving performance
- JFS Filesystem#Optimizations
- XFS#Performance
- Btrfs#Defragmentation, Btrfs#Compression, and btrfs(5)
- ZFS#Tuning
- NTFS#Improving performance
更改内核选项[编辑 | 编辑源代码]
有些选项会影响块设备的性能,更多信息请参考 sysctl#Virtual memory。
I/O 调度[编辑 | 编辑源代码]
背景信息[编辑 | 编辑源代码]
I/O调度器是用于决定块I/O操作提交到存储设备顺序的一个内核组件,这里有必要简单的说一下两种主要驱动器的规格,因为I/O调度器的目标是优化这些不同驱动器的处理请求的方式:
- 机械硬盘(HDD)的工作原理是旋转磁盘以物理方式将扇区靠近磁头,基于这个原因,机械硬盘的随机读取延迟相当的高,一般在3到12毫秒。而顺序访问则可提供更高的吞吐量,典型的机械硬盘的I/O操作吞吐量约为每秒200次即200IOPS
- 固态硬盘(SSD)没有机械硬盘那样的移动部件,因此它的随机访问速度和顺序访问速度是一样快的,通常不到0.1毫秒,并且可以处理多个并发的请求。典型的固态硬盘的I/O吞吐量超过了10,000IOPS,远超出了常见工作负载下所需的吞吐量
当多个进程向不同的存储部件发出了I/O请求,每秒就会产生数千的I/O请求,而一般的机械硬盘每秒只能处理大约200个I/O请求,这就是I/O调度发挥其作用的时候。
调度算法[编辑 | 编辑源代码]
提高吞读量的一种方法是将访问队列进行物理上的线性化,即通过对等待请求按照逻辑地址进行排序,并且将地址上最近的一些请求组合在一起,举个例子,当三个程序发出请求,分别为1,2,3,其中1号要操作的位置在磁盘中间,而2号操作的位置在磁盘头部,3号操作的位置位于磁盘尾部,出于提高吞读的需求,我们会希望磁头能够转完一圈的时候将这三个I/O操作全部完成,也就是按照2,1,3的顺序来执行I/O操作,这很像现实生活中的电梯,因此,这个调度算法也被称为电梯调度算法(elevator)。
但是电梯调度算法存在着一个问题,它并不适用于一个正在进行顺序读写的进程,常见的情况是:一个进程读取了一块数据并且就要读取下一块,上一个操作完成到下一个操作发起的时间相对于已经转到物理位置时进行读写所需时间来讲是很长的(进行操作仅需几微秒,而上一个操作完成到下一个往往需要几十微秒,这个调度算法推出的时候5400转的机械硬盘依旧是主流,而5400转的硬盘转一圈需要11.11毫秒)。 这个例子暴露出了电梯调度算法并不知道进程即将读取下一块数据. 预测调度算法(anticipatory I/O scheduler)解决了这个问题:它在处理另一个请求之前,先暂停几毫秒(并不是停下磁碟),等待另一个与上个操作在物理位置上相近的操作。
以上两个调度算法都是试图提高总吞吐量的算法, 但他们忽视了I/O调度中的另一个问题,就是请求的延迟问题. 举个例子:当大多数进程在磁碟头发起请求,而少量的进程在磁碟另外一端,需要转一圈才可以访问(反转是不现实的,将一片正在高速旋转的磁碟突然停下然后反转这对磁碟的强度提出了极高的需求),这时,预测算法就会在一系列对于磁碟头的操作的请求中,等待数多个“几毫秒”,而这时磁盘尾端操作的请求早已等候许久,甚至可能已经因为I/O操作无法完成而无法进行下一步操作。我们将这样的情况称之为饥饿. 为了增加I/O操作的公平性 ,截止日期算法( deadline algorithm)应运而生 。它也有一个按照地址排序的队列,与电梯算法类似, 当一个进程在队列中等待了太久它们就会被移入到一个名为 "expired"(意指到期) 的队列中按照谁到期时间更久谁越靠前进行排序。调度器会检查这个队列,并从中处理请求,处理完成后,才会回到电梯队列中。这种调度方式牺牲了整体吞吐量以此交换了操作延迟,这也是I/O调度算法所需解决的问题,即吞吐量与延迟的平衡。
完全公平队列算法( Completely Fair Queuing (CFQ) )试图从不同的角度来平衡吞吐量与延迟,它将时间划分成时间片,然后根据进程优先级来划分配这个时间片与允许的请求数。 它支持 cgroup 以此允许为特定的进程保留一些时间片和请求数。 这在云存储中特别适用: 一些云存储服务的付费用户总是期望可以得到更多的IOPS。 此外它也会在同步I/O结束的时候闲置下来,等待其他附近位置的操作, 以此,它取代了预测调度算法(anticipatory scheduler)并且带来了一些增强的功能. 预测调度算法(anticipatory scheduler)和电梯调度算法(elevator schedulers)都在现在的 Linux 内核中被下面更为先进的算法所替代。
预算公平算法( Budget Fair Queuing (BFQ) )是一种基于 CFQ 代码增加一些功能改进而来。它并不为磁盘分配固定的时间片,而是根据使用启发式的算法为进程分配“预算”(Budget). 它相对与其他算法来说比较复杂, 这种算法也许更适合本来吞吐量就偏低的机械硬盘和低速SSD,它每次操作的开销都更高,特别是当CPU速度较慢时,这个调度算法甚至可能降低设备的响应速度. 它在个人系统上的目标是优化交互式任务的体验, 此时的存储设备与空闲的时候响应性能相当。它默认的配置被用来提供最低延迟,而不是最大吞吐量
Kyber 是一种新推出的调度器。受到网络路由器的主动队列管理结束启发,使用“token”用作限制请求的一种调度机制。这个机制有两个“令牌”(token)一个称为队列令牌(queuing token),用来防止请求进入“饥饿”状态,另一个称为“调度令牌”(dispatch token),被用作限制在指定设备上操作的优先级。 最后再定义一个目标读取延迟,并调整调度器来达到这个目标。 这个算法的实现相对简单,并且被认为是用于高速的设备(例如固态硬盘)。
内核的 I/O 调度器[编辑 | 编辑源代码]
除去一些早期调度算法已经被移除, 官方的Linux kernel支持多种调度器,主要分为一下两类:
- 内核默认支持多队列调度器(multi-queue schedulers). 多队列块I/O排队机制( Multi-Queue Block I/O Queuing Mechanism (blk-mq) )将I/O查询映射到多个队列, 任务分布于多个线程之间,同样也分布在多个CPU核心之间。 以下几种调度器在这个框架中可用:
- None, 不应用任何调度算法.
- mq-deadline, deadline(截止日期调度算法)的多线程版本。
- Kyber
- BFQ
- 单队列调度器 (single-queue schedulers) 是旧版本所拥有的调度器:
- 注意: 单队列调度器(single-queue schedulers)自从Linux 5.0起被移除.
更改 I/O 调度器[编辑 | 编辑源代码]
列出单一设备所有可用的调度器及激活中的调度器(方括号内):
$ cat /sys/block/sda/queue/scheduler
mq-deadline kyber [bfq] none
列出所有设备可用的所有调度器:
$ grep "" /sys/block/*/queue/scheduler
/sys/block/pktcdvd0/queue/scheduler:none /sys/block/sda/queue/scheduler:mq-deadline kyber [bfq] none /sys/block/sr0/queue/scheduler:[mq-deadline] kyber bfq none
将 sda 使用的调度器更改为 bfq:
# echo bfq > /sys/block/sda/queue/scheduler
不论硬盘为何种类型,更改 I/O 调度器的操作都可以被自动化并持久化。例如,下列展示的 udev 规则将 NVMe 硬盘的调度器设为 none,SSD/eMMC 硬盘设为 mq-deadline,机械硬盘设为 bfq:
/etc/udev/rules.d/60-ioschedulers.rules
# set scheduler for NVMe ACTION=="add|change", KERNEL=="nvme[0-9]n[0-9]", ATTR{queue/scheduler}="none" # set scheduler for SSD and eMMC ACTION=="add|change", KERNEL=="sd[a-z]|mmcblk[0-9]*", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="mq-deadline" # set scheduler for rotating disks ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"
完成后重启或是强制 udev 加载新规则。
使用 I/O 调度器[编辑 | 编辑源代码]
Each of the kernel's I/O scheduler has its own tunables, such as the latency time, the expiry time or the FIFO parameters. They are helpful in adjusting the algorithm to a particular combination of device and workload. This is typically to achieve a higher throughput or a lower latency for a given utilization. The tunables and their description can be found within the kernel documentation.
To list the available tunables for a device, in the example below sdb which is using deadline, use:
$ ls /sys/block/sdb/queue/iosched
fifo_batch front_merges read_expire write_expire writes_starved
To improve deadline's throughput at the cost of latency, one can increase fifo_batch
with the command:
# echo 32 > /sys/block/sdb/queue/iosched/fifo_batch
电源管理配置[编辑 | 编辑源代码]
使用机械硬盘(HDD)时你可能会想要 降低或是完全关闭省电功能,并确认写入缓存是否已经启用。
可以参考 电源管理配置 和 Hdparm#Write cache。
完成后,你可以配置 udev 规则 以使更改在启动时应用。
减少磁盘读写[编辑 | 编辑源代码]
Avoiding unnecessary access to slow storage drives is good for performance and also increasing lifetime of the devices, although on modern hardware the difference in life expectancy is usually negligible.
显示磁盘写信息[编辑 | 编辑源代码]
The iotop包 package can sort by disk writes, and show how much and how frequently programs are writing to the disk. See iotop(8) for details.
重定位文件到 tmpfs[编辑 | 编辑源代码]
Relocate files, such as your browser profile, to a tmpfs file system, for improvements in application response as all the files are now stored in RAM:
- Refer to Profile-sync-daemon for syncing browser profiles. Certain browsers might need special attention, see e.g. Firefox on RAM.
- Refer to Anything-sync-daemon for syncing any specified folder.
- Refer to Makepkg#Improving compile times for improving compile times by building packages in tmpfs.
文件系统[编辑 | 编辑源代码]
Refer to corresponding file system page in case there were performance improvements instructions, e.g. Ext4#Improving performance and XFS#Performance.
交换空间[编辑 | 编辑源代码]
见 Swap#性能优化。
Writeback interval 和缓冲区大小[编辑 | 编辑源代码]
使用 ionice 调度储存 I/O[编辑 | 编辑源代码]
Many tasks such as backups do not rely on a short storage I/O delay or high storage I/O bandwidth to fulfil their task, they can be classified as background tasks. On the other hand quick I/O is necessary for good UI responsiveness on the desktop. Therefore it is beneficial to reduce the amount of storage bandwidth available to background tasks, whilst other tasks are in need of storage I/O. This can be achieved by making use of the linux I/O scheduler CFQ, which allows setting different priorities for processes.
The I/O priority of a background process can be reduced to the "Idle" level by starting it with
# ionice -c 3 command
See a short introduction to ionice and ionice(1) for more information.
CPU[编辑 | 编辑源代码]
超频[编辑 | 编辑源代码]
超频通过提升 CPU 的时钟频率提升电脑性能,超频能力取决于 CPU 和主板的型号。通常使用 BIOS 进行超频。超频也会带来风险和不便,这里既不推荐超频也不反对超频。
很多 Intel 芯片不会在 acpi_cpufreq 或其他软件中报告真正的时钟频率。这会导致过多的 dmesg 消息,通过卸载并添加 acpi_cpufreq
内核参数可以避免此问题。使用 i7z包 中的 i7z 可以读取真实的时钟速度。对于正确的 CPU 超频方式而言,推荐使用 压力测试。
自动调整频率[编辑 | 编辑源代码]
Tweak default scheduler (CFS) for responsiveness[编辑 | 编辑源代码]
The default CPU scheduler in the mainline Linux kernel is CFS.
The upstream default settings are tweaked for high throughput which make the desktop applications unresponsive under heavy CPU loads.
The cfs-zen-tweaksAUR package contains a script that sets up the CFS to use the same settings as the linux-zen包 kernel. To run the script on startup, enable/start set-cfs-tweaks.service
.
Alternative CPU schedulers[编辑 | 编辑源代码]
- MuQSS — Multiple Queue Skiplist Scheduler. Available with the
-ck
patch set developed by Con Kolivas.
- PDS — Priority and Deadline based Skiplist multiple queue scheduler focused on desktop responsiveness.
- BMQ — The BMQ "BitMap Queue" scheduler was created based on existing PDS development experience and inspired by the scheduler found in Zircon by Google, the kernel on which their Fuchsia OS initiative runs. Available with a set of patches from CachyOS.
- Project C — Cross-project for refactoring BMQ into Project C, with re-creation of PDS based on the Project C code base. So it is a merge of the two projects, with a subsequent update of the PDS as Project C. Recommended as a more recent development.
- CacULE — The CacULE CPU scheduler is a CFS patchset that is based on interactivity score mechanism. The interactivity score is inspired by the ULE scheduler (FreeBSD scheduler). The goal of this patch is to enhance system responsiveness/latency.
- TT — The goal of the Task Type (TT) scheduler is to detect tasks types based on their behaviours and control the schedulling based on their types.
- BORE — The BORE scheduler focuses on sacrificing some fairness for lower latency in scheduling interactive tasks, it is built on top of CFS and is only adjusted for vruntime code updates, so the overall changes are quite small compared to other unofficial CPU schedulers.
Real-time kernel[编辑 | 编辑源代码]
Some applications such as running a TV tuner card at full HD resolution (1080p) may benefit from using a realtime kernel.
Adjusting priorities of processes[编辑 | 编辑源代码]
See also nice(1) and renice(1).
Ananicy[编辑 | 编辑源代码]
Ananicy is a daemon, available in the ananicy-gitAUR package, for auto adjusting the nice levels of executables. The nice level represents the priority of the executable when allocating CPU resources.
cgroups[编辑 | 编辑源代码]
See cgroups.
Cpulimit[编辑 | 编辑源代码]
Cpulimit is a program to limit the CPU usage percentage of a specific process. After installing cpulimit包, you may limit the CPU usage of a processes' PID using a scale of 0 to 100 times the number of CPU cores that the computer has. For example, with eight CPU cores the precentage range will be 0 to 800. Usage:
$ cpulimit -l 50 -p 5081
irqbalance[编辑 | 编辑源代码]
The purpose of irqbalance包 is distribute hardware interrupts across processors on a multiprocessor system in order to increase performance. It can be controlled by the provided irqbalance.service
.
Turn off CPU exploit mitigations[编辑 | 编辑源代码]
Turning off CPU exploit mitigations may improve performance. Use below 内核参数 to disable them all:
mitigations=off
The explanations of all the switches it toggles are given at kernel.org. You can use spectre-meltdown-checkerAUR 或者 util-linux包 提供的 lscpu(1) 进行漏洞检测。
显卡[编辑 | 编辑源代码]
Xorg 配置[编辑 | 编辑源代码]
显卡性能由/etc/X11/xorg.conf
的配置决定,见 NVIDIA、ATI 和 Intel 文章。配置不当可能导致 Xorg 停止工作,请慎重操作。
Mesa 配置[编辑 | 编辑源代码]
The performance of the Mesa drivers can be configured via drirc. GUI configuration tools are available:
- adriconf (Advanced DRI Configurator) — GUI tool to configure Mesa drivers by setting options and writing them to the standard drirc file.
- DRIconf — Configuration applet for the Direct Rendering Infrastructure. It allows customizing performance and visual quality settings of OpenGL drivers on a per-driver, per-screen and/or per-application level.
硬件视频加速[编辑 | 编辑源代码]
Hardware video acceleration makes it possible for the video card to decode/encode video.
超频[编辑 | 编辑源代码]
与 CPU 一样,超频可以直接提高性能,但通常不建议使用。AUR中的超频工具有rovclockAUR (ATI 显卡)、rocm-smi-libAUR (较新的 AMD 显卡) 、nvclockAUR (到 Geforce 9系的旧 NVIDIA 显卡),以及适用于新 NVIDIA 显卡的nvidia-utils包。
见 AMDGPU#Overclocking 或 NVIDIA/Tips and tricks#Enabling overclocking。
启用 PCI Resizable BAR[编辑 | 编辑源代码]
- 在某些系统上启用 PCI Resizable BAR 会造成性能严重下降。配置完成后务必对系统进行测试以确保性能获得了提升。
- 启用前 Compatibility Support Module (CSM) 必须关闭。
PCI 标准允许使用更大的 基地址寄存器(BAR) 以向 PCI 控制器暴露 PCI 设备的内存。这一操作可以提高显卡的性能。允许访问显卡的所有显存可以提高性能,同时也允许显卡驱动对其进行针对性的优化。AMD将 resizable BAR、above 4G decoding 和显示驱动一系列针对性的优化统称为 AMD 显存智取技术,最初出现在搭载 AMD 500 系列芯片组的主板上,并随后通过 UEFI 更新扩展至 AMD 400 系列及 Intel 300 系列主板。并非所有主板都有该设置,在某些主板上还会造成启动问题。
如果 BAR 大小为 256M,说明该特性尚未启用或不受支持:
# dmesg | grep BAR=
[drm] Detected VRAM RAM=8176M, BAR=256M
如要启用,需要在主板的设置中启用 "Above 4G Decode" 或 ">4GB MMIO"。完成后检查 BAR 是否变大:
# dmesg | grep BAR=
[drm] Detected VRAM RAM=8176M, BAR=8192M
内存、虚拟内存与内存溢出处理[编辑 | 编辑源代码]
Clock frequency and timings[编辑 | 编辑源代码]
RAM can run at different clock frequencies and timings, which can be configured in the BIOS. Memory performance depends on both values. Selecting the highest preset presented by the BIOS usually improves the performance over the default setting. Note that increasing the frequency to values not supported by both motherboard and RAM vendor is overclocking, and similar risks and disadvantages apply, see #超频.
Root on RAM overlay[编辑 | 编辑源代码]
If running off a slow writing medium (USB, spinning HDDs) and storage requirements are low, the root may be run on a RAM overlay ontop of read only root (on disk). This can vastly improve performance at the cost of a limited writable space to root. See liverootAUR.
zram 或 zswap[编辑 | 编辑源代码]
内核模块 zram(以前叫做 compcache)在内存中提供了一个压缩块。若将其用作交换空间,则内存可以保存更多的数据,代价是消耗更多的 CPU 。但是它仍然比硬盘上的交换空间快得多。若一个系统经常使用交换空间,使用 zram 可以提高响应。使用 zram 也可以减少对磁盘的读写,当交换空间被设置到固态硬盘时,这可以增加固态硬盘的寿命。
zswap 可以带来相似的益处(和相似的代价)。两者不同的是 zswap 将页面压缩后换入交换空间,而 zram 则换入内存。详见 zswap 以查看两者差异。
Since it is enabled by default, disable zswap when you use zram to avoid it acting as a swap cache in front of zram. Having both enabled also results in incorrect zramctl(8) statistics as zram remains mostly unused; this is because zswap intercepts and compresses memory pages being swapped out before they can reach zram.
例如:设置一个使用 lz4 压缩算法、32GB、高优先级的 zram(仅作用于当前会话):
# modprobe zram # echo lz4 > /sys/block/zram0/comp_algorithm # echo 32G > /sys/block/zram0/disksize # mkswap --label zram0 /dev/zram0 # swapon --priority 100 /dev/zram0
若要禁用它,可以重启或运行:
# swapoff /dev/zram0 # rmmod zram
若要查看详细的步骤、选项与潜在问题,见 zram 模块的官方文档。
zram-generator包 提供了一个 systemd-zram-setup@.service
单元用来自动初始化 zram 设备。此单元无需被 [enable/start]。以下资源提供了使用它的必要信息:
“生成器将会在系统启动的早期被 systemd 调用”,因此使用它只需要创建配置文件并重启。这里提供了一个简单的配置:/usr/share/doc/zram-generator/zram-generator.conf.example 。可以通过检查 swap 的状态 或通过检查 systemd-zram-setup@zramN.service
的 状态 来检查 zram 的情况。这里 /dev/zramN
是配置文件中设定的内容。
The package zramswapAUR provides an automated script for setting up a swap with a higher priority and a default size of 20% of the RAM size of your system. To do this automatically on every boot, enable zramswap.service
.
此外,zramdAUR 默认以 zstd 算法自动设置 zram 。其配置文件位于 /etc/default/zramd
并且需要 启用 zramd.service
服务。
Swap on zram using a udev rule[编辑 | 编辑源代码]
The example below describes how to set up swap on zram automatically at boot with a single udev rule. No extra package should be needed to make this work.
First, enable the module:
/etc/modules-load.d/zram.conf
zram
Configure the number of /dev/zram nodes you need.
/etc/modprobe.d/zram.conf
options zram num_devices=2
Create the udev rule as shown in the example.
/etc/udev/rules.d/99-zram.rules
KERNEL=="zram0", ATTR{disksize}="512M" RUN="/usr/bin/mkswap /dev/zram0", TAG+="systemd" KERNEL=="zram1", ATTR{disksize}="512M" RUN="/usr/bin/mkswap /dev/zram1", TAG+="systemd"
Add /dev/zram to your fstab.
/etc/fstab
/dev/zram0 none swap defaults 0 0 /dev/zram1 none swap defaults 0 0
使用显存[编辑 | 编辑源代码]
在很少见的情况下,内存很小而显存过剩,那么可以将显存设为交换文件。见 Swap on video RAM.
在低内存情况下改善系统反应速度[编辑 | 编辑源代码]
On traditional GNU/Linux system, especially for graphical workstations, when allocated memory is overcommitted, the overall system's responsiveness may degrade to a nearly unusable state before either triggering the in-kernel OOM-killer or a sufficient amount of memory got free (which is unlikely to happen quickly when the system is unresponsive, as you can hardly close any memory-hungry applications which may continue to allocate more memory). The behaviour also depends on specific setups and conditions, returning to a normal responsive state may take from a few seconds to more than half an hour, which could be a pain to wait in serious scenario like during a conference presentation.
/proc/sys/vm/oom_kill_allocating_task
is 0 and consider changing it. [2]While the behaviour of the kernel as well as the userspace things under low-memory conditions may improve in the future as discussed on kernel and Fedora mailing lists, users can use more feasible and effective options than hard-resetting the system or tuning the vm.overcommit_*
sysctl parameters:
- Manually trigger the kernel OOM-killer with Magic SysRq key, namely
Alt+SysRq+f
. - Use a userspace OOM daemon to tackle these automatically (or interactively).
Sometimes a user may prefer OOM daemon to SysRq because with kernel OOM-killer you cannot prioritize the process to (or not) terminate. To list some OOM daemons:
- systemd-oomd — Provided by systemd as
systemd-oomd.service
that uses cgroups-v2 and pressure stall information (PSI) to monitor and take action on processes before an OOM occurs in kernel space.
- earlyoom — Simple userspace OOM-killer implementation written in C.
- oomd — OOM-killer implementation based on PSI, requires Linux kernel version 4.20+. Configuration is in JSON and is quite complex. Confirmed to work in Facebook's production environment.
- nohang — Sophisticated OOM handler written in Python, with optional PSI support, more configurable than earlyoom.
- low-memory-monitor — GNOME developer's effort that aims to provides better communication to userspace applications to indicate the low memory state, besides that it could be configured to trigger the kernel OOM-killer. Based on PSI, requires Linux 5.2+.
- uresourced — A small daemon that enables cgroup based resource protection for the active graphical user session.
网络[编辑 | 编辑源代码]
- Kernel networking: see Sysctl#Improving performance
- NIC: see 网络配置#Set device MTU and queue length
- DNS: consider using a caching DNS resolver, see Domain name resolution#DNS servers
- Samba: see Samba#Improve throughput
Watchdogs[编辑 | 编辑源代码]
According to Wikipedia:Watchdog timer:
- A watchdog timer [...] is an electronic timer that is used to detect and recover from computer malfunctions. During normal operation, the computer regularly resets the watchdog timer [...]. If, [...], the computer fails to reset the watchdog, the timer will elapse and generate a timeout signal [...] used to initiate corrective [...] actions [...] typically include placing the computer system in a safe state and restoring normal system operation.
Many users need this feature due to their system's mission-critical role (i.e. servers), or because of the lack of power reset (i.e. embedded devices). Thus, this feature is required for a good operation in some situations. On the other hand, normal users (i.e. desktop and laptop) do not need this feature and can disable it.
To disable watchdog timers (both software and hardware), append nowatchdog
to your boot parameters.
To check the new configuration do:
# cat /proc/sys/kernel/watchdog
or use:
# wdctl
Either action will speed up your boot and shutdown, because one less module is loaded. Additionally disabling watchdog timers increases performance and lowers power consumption.
See [3], [4], [5], and [6] for more information.