Kernel Articles From LWN
-
A generic ring buffer for the kernel
The kernel's user-space ABI does not lack for ring buffers; they have been defined for subsystems like BPF, io_uring, perf, and tracing, for example. Naturally, each of those ring buffers is unique, with no common interface between them. The natural response to this ABI proliferation is, of course, to add yet another ring buffer as the generic option; that is the intent of this patch series from Kent Overstreet adding a new set of system calls for ring buffers.
A ring buffer is simply a circular buffer, maintained in memory, that is shared between user space and the kernel. One side of a data stream writes data into the buffer, while the other consumes it; as long as the buffer neither overflows nor underflows, data can be transferred with no system calls at all. The addition of a ring buffer can, thus, enable highly efficient data transfer in situations where the data rates are relatively high. Overstreet thinks that other kernel subsystems could benefit from ring-buffer interfaces, and would like to make it possible to add those interfaces without reinventing the wheel.
-
P4TC hits a brick wall
The kernel supports a number of traffic-control mechanisms in its networking subsystem; tc-flower is perhaps one of the most commonly used. The P4TC subsystem proposed by Salim fits into that subsystem, adding the ability to use the P4 language for the description of networking policies. It is entirely implemented in software, and runs within the kernel.
That software implementation was one of the first aspects of this work to attract attention. While one can do complicated network processing and routing in the kernel, any implementation that is intended to keep up with current network speeds needs quite a bit of hardware support. There are vendors selling networking hardware that is programmable with P4 now, and P4TC intends to support that hardware, but that capability was not present in the patch set (or any subsequent version). Jiri Pirko asked about that omission in response to the initial posting; Salim answered that the traffic-control subsystem requires a software implementation for any functionality that can be offloaded to hardware (thus ensuring that the functionality is universally available), and that the hardware-offload interfaces were still under consideration. Daniel Borkmann complained that nobody would use the software implementation, an assertion that Salim disagreed with.
-
Modernizing BPF for the next 10 years
BPF was first generalized beyond packet filtering more than a decade ago. In that time, it has changed a lot, becoming much more capable. Alexei Starovoitov kicked off the second day of the BPF track at the 2024 Linux Storage, Filesystem, Memory Management, and BPF Summit by leading a session discussing which changes to BPF are going to come in the next ten years as it continues evolving. He proposed several ideas, including expanding the number of registers available to BPF programs, dynamic deadlock detection, and relaxing some existing limits of the verifier.
Starovoitov started with a recap of the last ten years of BPF development. BPF's initial use case was for networking — hence the name "Berkeley Packet Filters". In 2015, this expanded into a new generation of tools, including using it for tracing. Everything in BPF has evolved for a reason, he said. Once support for BPF existed in the kernel, user-space tools like katran and Cilium started popping up to take advantage of it.
-
Securing BPF programs before and after verification
BPF is in a unique position in terms of security. It runs in a privileged context, within the kernel, and can have access to many sensitive details of the kernel's operation. At the same time, unlike kernel modules, BPF programs aren't signed. Additionally, the mechanisms behind BPF present challenges to implementing signing or other security features. Three nearly back-to-back sessions at the 2024 Linux Storage, Filesystem, Memory Management, and BPF Summit addressed some of the potential security problems.
-
Dropping the page cache for filesystems
VFS maintainer Christian Brauner led a discussion about the possibility of selectively dropping the contents of the page cache for a filesystem in a session at the 2024 Linux Storage, Filesystem, Memory Management, and BPF Summit. As he described in his topic proposal, the use case that started him down this path comes from GNOME, which wants to be able to safely suspend access to an encrypted home directory. While it is known to kernel developers, it is surprising to others that reads from encrypted filesystems that have been suspended will succeed if the data to be read still exists in the page cache.