LWN Articles, Mostly Linux Kernel
-
Wrapping up 2022 [LWN.net]
Yet another year is coming to a close; that can only mean that the time has come to indulge in a longstanding LWN tradition: looking back at the predictions we made in January and giving them the mocking that they richly deserve. Read on to see how those predictions went, what was missed, and a look back at the year in general.
-
6.2 Merge window, part 1 [LWN.net]
Once upon a time, Linus Torvalds would try to set a pace of about 1,000 changesets pulled into the mainline each day during the early part of the merge window. For 6.2, though, the situation is different; no less than 9,278 non-merge changesets were pulled during the first two days. Needless to say, these commits affect the kernel in numerous ways, even though there are fewer fundamental changes than were seen in 6.1.
-
Enabling non-executable memfds [LWN.net]
The memfd interface is a bit of a strange and Linux-specific beast; it was initially created to support the secure passing of data between cooperating processes on a single system. It has since gained other roles, but it may still come as a surprise to some to learn that memory regions created for memfds, unlike almost any other data area, have the execute permission bit set. That can facilitate attacks; this patch set from Jeff Xu proposes an addition to the memfd API to close that hole.
A memfd is created with a call to memfd_create(), which will return a file descriptor referring to the region. That descriptor can be treated as an ordinary file, in that it can be written to or read from; it can also be mapped into a process's address space. Normally the first step will be to call ftruncate() to set the size of the region; after that it can be populated with data and passed to another process. One interesting characteristic of memfds is that they can be "sealed" with a call to fcntl(), an operation that disallows any further changes to the stored data. Sealing allows a recipient to know that the contents of a memfd will not change in unexpected ways in the middle of an operation.
-
The intersection of shadow stacks and CRIU [LWN.net]
Shadow stacks are one of the methods employed to enforce control-flow integrity and thwart attackers; they are a mechanism for fine-grained, backward-edge protection. Most of the time, applications are not even aware that shadow stacks are in use. As is so often the case, though, life gets more complicated when the Checkpoint/Restore in Userspace (CRIU) mechanism is in use. Not breaking CRIU turns out to be one of the big challenges facing developers working to get user-space shadow-stack support into the kernel.
The idea behind shadow stacks is simple: in addition to the normal program stack (which holds return addresses, local variables, and more) there is a special memory area, called the "shadow stack", that stores only return addresses. Whenever a CALL instruction is executed, the return address is pushed onto both the normal and the shadow stacks. When, later, a function ends with a RET instruction, the return address that's popped from the normal stack is compared to that on the shadow stack. If they match, the execution continues; if they don't, a violation of control-flow integrity has just been detected.
Recent x86 processor models implement shadow stacks in hardware, meaning that no instrumentation is required for a program to get the protection that shadow stacks provide and that the cost of using a shadow stack is negligible. Once the feature is enabled, the CPU takes care of pushing and popping the return address on the shadow stack and comparing the return addresses. If the return addresses do not match, the CPU generates a control protection exception. To support shadow stacks, the x86 architecture has been extended with a model-specific register (MSR) that controls the use of the shadow stack and its features. There are also shadow-stack pointer MSRs (one for each possible privilege level) and a set of instructions for manipulating shadow-stack contents.
The discussion about how kernel should support shadow stacks for user space started a long time ago, but it has still not concluded. One of the difficulties in enabling this feature is the possibility that some applications will be broken by shadow stacks because they use non-standard ways to change their control flow. The list of problematic applications includes GDB, various JIT engines, and, of course, CRIU.