LWN on Linux Kernel
-
LWN ☛ A possible path for cancelable BPF programs
The Linux kernel supports attaching BPF programs to many operations. This is generally safe because the BPF verifier ensures that BPF programs can't misuse kernel resources, run indefinitely, or otherwise escape their boundaries. There is continuing tension, however, between trying to expand the capabilities of BPF programs and ensuring that the verifier can handle every edge case. On February 14, Juntong Deng shared a proof-of-concept patch set that adds some run-time checks to BPF to make it possible in the future to interrupt a running BPF program.
When initially conceived, BPF had strict limits on the number of instructions a program could contain, and did not permit loops, limiting how long a program could run. This is important because the kernel will call BPF hooks during many time-sensitive operations, so a misbehaving BPF program that managed to run for too long could potentially cause kernel hangs or other problems. Over time, those limits have been gradually expanded for two reasons: the verifier has become more capable — handling loops, more complicated functions, etc. — and developers have discovered that complicated, long-running BPF programs are quite useful, prompting them to ask for the limits to be loosened.
-
LWN ☛ Slabs, sheaves, and barns
The kernel's slab allocator is responsible for the allocation of small (usually sub-page) chunks of memory. For many workloads, the speed of object allocation and freeing is one of the key factors in overall performance, so it is not surprising that a lot of effort has gone into optimizing the slab allocator over time. Now that the kernel is down to a single slab allocator, the memory-management developers have free rein to add complexity to it; the latest move in that direction is the per-CPU sheaves patch set from slab maintainer Vlastimil Babka.
Many kernel developers interact with the slab allocator using functions like kmalloc(), which can allocate objects of any (reasonable) size. There is a lower level to the slab allocator, though, that deals with fixed-size objects; it is used heavily by subsystems that frequently allocate and free objects of the same size. The curious can see all of the special-purpose slabs in their system by looking at /proc/slabinfo. There are many core-kernel operations that involve allocating objects from these slabs and returning them, so the slab allocator has gained a number of features, including NUMA awareness and bulk operations, to accelerate allocation and freeing.
-
LWN ☛ Support for atomic block writes in 6.13
Atomic block writes, which have been discussed here a few times in the past, are block operations that either complete fully or do not occur at all, ensuring data consistency and preventing partial (or "torn") writes. This means the disk will, at all times, contain either the complete new data from the atomic write operation or the complete old data from a previous write. It will never have a mix of both the old and the new data, even if a power failure occurs during an ongoing atomic write operation. Atomic writes have been of interest to many Linux users, particularly database developers, as this feature can provide significant performance improvements.
The Linux 6.13 merge window included a pull request from VFS maintainer Christian Brauner titled "vfs untorn writes", which added the initial atomic-write capability to the kernel. In this article, we will briefly cover what these atomic writes are, why they are important in database world, and what is currently supported in the 6.13 kernel.
-
LWN ☛ Filesystem support for block sizes larger than the page size
The maximum filesystem block size that the kernel can support has always been limited by the host page size for Linux, even if the filesystems could handle larger block sizes. The large-block-size (LBS) patches that were merged for the 6.12 kernel removed this limitation in XFS, thereby decoupling the page size from the filesystem block size. XFS is the first filesystem to gain this support, with other filesystems likely to add LBS support in the future. In addition, the LBS patches have been used to get the initial atomic-write support into XFS.