New LWN Articles About Linux Kernel
-
The FUSE BPF filesystem>
The Filesystem in Userspace (FUSE) framework can be used to create a "stacked" filesystem, where the FUSE piece adds specialized functionality (e.g. reporting different file metadata) atop an underlying kernel filesystem. The performance of such filesystems leaves a lot to be desired, however, so the FUSE BPF filesystem has been proposed to try to improve the performance to be close to that of the underlying native filesystem. It came up in the context of a session on FUSE passthrough earlier in the 2023 Linux Storage, Filesystem, Memory-Management and BPF Summit, but the details of FUSE BPF were more fully described by Daniel Rosenberg in a combined filesystem and BPF session on the final day of the summit.
Rosenberg said that he wanted to introduce the filesystem, describe its current status, and discuss some of the open questions with regard to future plans for it. The goal is for a stacked FUSE filesystem to come as close to the native filesystem's performance as the FUSE BPF developers can get. In addition, they want to keep "all of the nice ease-of-use of FUSE", with its "defined entry points"; the idea is to keep the interface "similar to what you would see from the FUSE daemon".
-
Large folios for anonymous memory
The transition to folios has transformed the memory-management subsystem in a number of ways, but has also resulted in a lot of code churn that has not been welcomed by all developers. As this work proceeds, though, some of the benefits from it are beginning to become clear. One example may well be in the handling of anonymous memory, as can be seen in a pair of patch sets from Ryan Roberts.
The initial Linux kernel release used 4KB pages on systems whose total memory size was measured in megabytes — and a rather small number of megabytes at that. Since then, installed-memory sizes have grown by a few orders of magnitude or so, but the 4KB page size remains mostly unchanged. So the kernel has to manage far more pages than it once did; that leads to more memory used for tracking, longer lists to scan, and more page faults to handle. In many ways, a 4KB page size is far too small for contemporary systems.
-
A pair of workqueue improvements
Over the years, the kernel has developed a number of deferred-execution mechanisms to take care of work that cannot be done immediately. For many (or most) needs, the workqueue subsystem is the tool that developers reach for first. Workqueues took their current form over a dozen years ago, but that does not mean that there are not improvements to be made. Two sets of patches from Tejun Heo show the pressures being felt by the workqueue subsystem and the solutions that are being tried — with varying degrees of success.
In normal usage, each subsystem creates its own workqueue (with alloc_workqueue()) to hold work items. When kernel code needs to defer a task, it can fill in a work_struct structure with the address of a function to call and some data to pass to that call. That structure can be passed, along with the target workqueue, to a function like queue_work(), and the workqueue mechanism will call the function at some future time. The call is made in process context, meaning that work items can block if need be. There is, of course, a long list of variants to queue_work(), and a number of ways in which workqueues themselves can be created, but the core functionality — call a function in process at a later time — remains the same.
-
The rest of the 6.5 merge window
Linus Torvalds released 6.5-rc1 and closed the merge window for this development cycle on July 9. By that point, 11,730 non-merge changesets had been pulled into the mainline for 6.5; over 7,700 of those were pulled after the first-half merge-window summary was written. The second half of the merge window saw a lot of code coming into the mainline and a long list of significant changes.
-
BPF iterators for filesystems
In the first of two combined BPF and filesystem sessions at the 2023 Linux Storage, Filesystem, Memory-Management and BPF Summit, Hou Tao introduced his BPF iterators for filesystem information. Iterators for BPF are a relatively recent addition to the BPF landscape; they help BPF programs step through kernel data structures in a loop-like manner, but without running afoul of the BPF verifier, which is notoriously hard to convince about loops.
In his remote presentation, Tao began with a quick overview of BPF iterators. They allow users to write a special type of BPF program that can step through kernel data structures in ways that would normally be handled with loops; instead, the BPF program contains callbacks that are made from the kernel in response to user-space reads of pinned BPF files. The callback is made for each new kernel object encountered in the data structure; the code in the callback can then present information from the object to user space in whatever format the developer wants.
-
Testing for storage and filesystems
The kdevops kernel-testing framework has come up at several earlier summits, including in two separate sessions at last year's event. Testing kernel filesystems and the block layer, not to mention lots of other kernel subsystems, has become increasingly important over time. So it was no surprise that Luis Chamberlain led a combined storage and filesystem session at the 2023 Linux Storage, Filesystem, Memory-Management and BPF Summit to talk more about testing, the resources needed for it, and what can be done to improve it. It was the final session for this year's summit, so this article completes our coverage.