Kernel Articles From LWN
-
Kernel security reporting for distributions
The call for topics for the Linux Kernel Maintainers Summit went out on August 15; one proposed topic has generated some interesting discussion about security-bug reporting for the kernel. A recent patch to the kernel's documentation about how to report security bugs recommends avoiding posting to the linux-distros mailing list because its goals and rules do not mesh well with kernel security practices. That led Jiri Kosina to suggest a discussion on security reporting, especially with regard to Linux distributions.
The linux-distros mailing list is a closed list for reporting security bugs that affect Linux systems; as might be guessed, the participants are representatives of various distributions. It has some fairly stringent requirements regarding the maximum embargo period (14 days) after a bug is reported before it must be publicly disclosed; it also places requirements on the reporter to post the full details to the oss-security mailing list once the embargo has run its course. These policies have clashed with kernel bug reporting along the way. Examples include a 2018 embargo that went awry, an even longer embargo botch in 2021, and a 2022 discussion of the problems.
-
An ioctl() call to detect memory writes
It is the kernel's business to know when a process's memory has been written to; among other things, this knowledge is needed to determine which pages can be immediately reclaimed or to properly write dirty pages to backing store. Sometimes, though, user space also needs access to this information in a reliable and fast manner. This patch series from Muhammad Usama Anjum adds a new ioctl() call for this purpose; using it requires repurposing an existing system call in an unusual way, though.
The driving purpose for this feature, it seems, is to enable an efficient emulation of the Windows GetWriteWatch() system call, which is evidently useful for game developers who want to defend against certain kinds of cheating. A game player who is able to access (and modify) a game's memory can enhance that game's functionality in ways that are not welcomed by the developers — or by other players. Using GetWriteWatch(), the game is able to detect when crucial data structures have been modified by an external actor, put up the modern equivalent of a "Tilt" indicator, and bring the gaming session to a halt.
Linux actually provides this functionality now by way of the pagemap file in /proc. The current dirty state of a range of pages can be read from this file, and writing the associated clear_refs file will reset the dirty state (useful, for example, after the game itself has written to the memory of interest). Accessing this file from user space is slow, though, which runs counter to the needs of most games. The new ioctl() call is meant to implement this feature more efficiently. The Checkpoint/Restore In Userspace (CRIU) project would also be able to make use of a more efficient mechanism to detect writes; in this case, the purpose is to identify pages that have been modified after the checkpoint process has begun.
-
Following up on file-position locking
LWN recently covered a discussion on file-position locking that demonstrated the hazards that can result from unexpected concurrency. It turns out that this discussion had not yet fully run its course. Since that article was written, additional changes intended to address a performance regression evolved into a core virtual filesystem (VFS) layer API change to carry out some much-delayed housecleaning.
At the end of the previous article, a change had been merged into the mainline to unconditionally take the file-position lock (which ensures that only one thread is manipulating the current file read/write position at any given time). The article noted that the performance impact of this change had not been measured. That changed on August 3, when Mateusz Guzik reported that there was indeed a performance change — specifically, a 5% regression on a test he had run. VFS layer maintainer Christian Brauner initially discounted the report, but also said that the problem, if it truly existed, could be mitigated by only taking the position lock for directories (and not for regular files). The original locking problem had only affected directory reads, so the fix is only needed there as well.
-
A new futex API
The Linux fast user-space mutex ("futex") subsystem debuted with the 2.6.0 kernel; it provides a mechanism that can be used to implement user-space locking. Since futexes avoid calling into the kernel whenever possible, they can indeed be fast, especially in the uncontended case. The API used to access futexes has never been seen as one of Linux's strongest points, though, so there has long been a desire to improve it. This patch series from Peter Zijlstra shows what the future of futexes may look like.
A futex is a 32-bit value stored in user-space memory that is, presumably, shared between at least two threads or processes. When used as a lock, a futex can be acquired with a single compare-and-swap instruction, without kernel involvement. The kernel comes into the picture, though, in the contended case, where a thread must block until a futex becomes available. Waiting for a futex and waking threads that are waiting are some of the features provided by the futex() system call.