Kernel: BPF, PTE, and x86
-
Long-lived kernel pointers in BPF [LWN.net]
The BPF subsystem allows programmers to write programs that can run safely in kernel space. All memory accesses and function calls in BPF programs are statically checked for safety using the in-kernel verifier, which analyzes programs in their entirety before allowing them to be loaded. While this allows the kernel to safely run BPF programs, it heavily restricts what those programs are able to do. Among these constraints is a rule that programs cannot store pointers into BPF maps for use (such as dereferencing them or passing them to the kernel in kfunc and BPF helper invocations) at a later time. A patch set by Kumar Kartikeya Dwivedi adds this capability to BPF.
-
Sharing page tables with msharefs [LWN.net]
A page-table entry (PTE) is relatively small, requiring just eight bytes to refer to a 4096-byte page on most systems. It thus does not seem like a worrisome level of overhead, and little effort has been made over the kernel's history to reduce page-table memory consumption. Those eight bytes can hurt, though, if they are replicated across a sufficiently large set of processes. The msharefs patch set from Khalid Aziz is a revised attempt to address that problem, but it is proving to be a hard sell in the memory-management community.
One of the defining characteristics of a process on Linux (or most other operating systems) is a distinct address space. As a result, the page tables that manage the state of that address space are private to each process (though threads within a process will share page tables). So if two processes have mappings to the same page in physical memory, each will have an independent page-table entry for that page. The overhead for PTEs, thus, increases linearly with the number of processes mapping each page.
-
The BPF panic function [LWN.net]
One of the key selling points of the BPF subsystem is that loading a BPF program is safe: the BPF verifier ensures that the program cannot hurt the kernel before allowing the load to occur. That guarantee is perhaps losing some of its force as more capabilities are made available to BPF programs but, even so, it may be a bit surprising to see this proposal from Artem Savkov adding a BPF helper that is explicitly designed to crash the system. If this patch set is merged in something resembling its current form, it will be the harbinger of a new era where BPF programs are, in some situations at least, allowed to be overtly destructive.
As Savkov notes, one of the major use cases for BPF is kernel debugging, a task which is also often helped by the existence of a well-timed crash dump. By making the kernel's panic() function available to BPF programs, Savkov is trying to combine the two by allowing a BPF program to cause a crash — and create a crash dump — when it detects the conditions that indicate a problem that a developer is looking for. Savkov is seemingly not the only one wanting this capability; Jiri Olsa noted that he has gotten a request for this feature as well.
Making panic() available to BPF has some obvious hazards, so one would expect that there would be some guard rails put into place. In this case, the first step is a new flag, BPF_F_DESTRUCTIVE, that must be provided when a program that will invoke destructive operations (such as a panic() call) is loaded. If this flag is not present, the BPF verifier will reject the loading of a program that contains calls to any destructive helper functions, of which panic() is the only one (so far).
Even then, the panic() helper function is only available to tracing programs. It makes little sense, after all, for an infrared decoder to be able to panic the system, though this restriction will prevent a complete implementation in BPF for remote controls featuring a "panic" button. Then, there is a new sysctl knob (kernel.destructive_bpf_enabled) that must be set to a non-zero value; otherwise the panic() call will not be allowed. Even when the sysctl knob has been set, the process on whose behalf the BPF program is running must have the CAP_SYS_BOOT capability.
-
Intel Sapphire Rapids Xeon CPUs To Feature Increased Power Saving On Linux Thanks To New Firmware
Intel has been preparing Idle driver support for the company's next-gen Xeon CPUs, codenamed "Sapphire Rapids", over the last five months. Michael Larabel of Phoronix notes that there is still a limitation within the upcoming Intel Scalable CPUs. The processor power state handling is mutually exclusive; specifically, core C-states (C-states) C1 and C1E, or "C1 Enhanced". The company has been unable to activate both C-states simultaneously, but with the current firmware update published by Intel, it appears a fix has been manufactured.
-
AMD Adds Last-Minute RDNA 3 GPU Driver Support Core For Linux 5.20
Leave it to AMD to continue to add some last-minute coding before the merge window for Linux 5.20 opening next week, reports Michael Larabel of the website Phoronix. Typically, cutoffs for any feature work placed into the DRM-Next queue for each Linux kernel cycle ends near the "-rc6" point of the cycle. The closing window has yet to stop AMD from attempting to ensure any finalizing of drivers for the upcoming Linux 5.20 kernel for their upcoming RDNA 3 graphics architecture.