LWN Reports on Linux Kernel
-
Famfs: a filesystem interface to shareable memory
At the 2024 Linux Storage, Filesystem, Memory Management, and BPF Summit, John Groves led a session on famfs, which is a filesystem he has developed that uses the kernel's direct-access (DAX) mechanism to access memory that is shareable between hosts. The discussion was aimed at whether a different approach should be taken and, in particular, whether FUSE should be used instead of implementing as an in-kernel filesystem. As noted in the thread about his proposal for an LSFMM+BPF session, and the mailing-list discussions on the first and second version of his patch set, there is some skepticism that a new in-kernel filesystem is warranted for the use case.
-
The rest of the 6.11 merge window
The release of 6.11-rc1 marked the end of the 6.11 merge window on July 28. By that time, 12,102 non-merge changesets had been pulled into the mainline repository; about 8,000 of those came in after the first-half summary was written. Quite a few significant changes were to be found in those changesets; there is also one big change that did not make it.
-
What became of getrandom() in the vDSO
In the previous episode of the vgetrandom() story, Jason Donenfeld had put together a version of the getrandom() system call that ran in user space, significantly improving performance for applications that need a lot of random data while retaining all of the guarantees provided by the system call. At that time, it seemed that a consensus had built around the implementation and that it was headed toward the mainline in that form. A few milliseconds after that article was posted, though, a Linus-Torvalds-shaped obstacle appeared in its path. That obstacle has been overcome and this work has now been merged for the 6.11 kernel, but its form has changed somewhat.
Torvalds initially rejected the idea of a vDSO implementation entirely, saying that there was no clear use case for it. At most, he said, the kernel should export a generation counter to inform user-space random-number generators that they should reseed themselves; anything beyond that, he said, was more than the kernel needed to provide. After a fair amount of back-and-forth with Donenfeld, who made the point that he did not want to expose the internal functioning of the kernel's random-number generator to user space, Torvalds reluctantly agreed to take another look and reconsider.
-
May the FOLL_FORCE not be with you
One of the simplest hardening concepts to understand is that memory should never be both writable and executable, otherwise an attacker can use it to load and run arbitrary code. That rule is generally followed in Linux systems, but there is a glaring loophole that is exploitable from user space to inject code into a running process. Attackers have duly exploited it. A new effort to close the hole ran into trouble early in the merge window, but a solution may yet be found in time for the 6.11 kernel release.
The special file /proc/PID/mem provides read and write access to the virtual address space of the process identified by PID. It is used primarily by debuggers, but it has a place in other applications (certain types of user-space hardening, for example) as well. Writing to this file will overwrite the process's memory at the current file offset. Interestingly, the kernel function that implements writing to this file — mem_rw() — uses the FOLL_FORCE flag when accessing the target memory. That flag causes the write to succeed, regardless of whether the normal memory protections at the target address would allow writing. As a result, /proc/PID/mem can be used to overwrite executable memory.