Linux Kernel: C, BPF, and More
-
LWN ☛ Maximal min() and max()
Like many projects written in C, the kernel makes extensive use of the C preprocessor; indeed, the kernel's use is rather more extensive than most. The preprocessor famously has a number of sharp edges associated with it. One might not normally think of increased compilation time as one of them, though. It turns out that some changes to a couple of conceptually simple preprocessor macros — min() and max() — led to some truly pathological, but hidden, behavior where those macros were used.
-
LWN ☛ CRIB: checkpoint/restore in BPF
The desire for the ability to checkpoint a process — to record its state in a form that can be restarted at a future time — on Linux is almost as old as Linux itself. See, for example, this announcement of a checkpoint project that appeared in LWN in 1998. While working solutions exist, they can be somewhat fragile and difficult to use; it is not surprising that some people are interested in finding a better alternative. A current effort goes by the name CRIB, for Checkpoint/Restore in (naturally) BPF. It is far from clear that CRIB will replace the existing solutions, but it is an interesting look at a different way of solving the problem.
A checkpoint/restore solution must overcome two challenges, neither of which is easy. On the checkpoint side, it is necessary to obtain a complete description of a process (or set of processes), with no important details overlooked; that requires collecting a lot of information that the kernel was not designed to export. On the restore side, that information must be used to recreate the checkpointed process(es), possibly on a different system, in such a way that the those processes cannot tell the difference — once again, using interfaces that were not designed for this purpose.
-
LWN ☛ Handling filesystem interruptibility
David Howells wanted to discuss changing the way filesystem code handles the ability to interrupt or kill operations, in order to fix some longstanding problems with network (and other) filesystems, in a session at the 2024 Linux Storage, Filesystem, Memory Management, and BPF Summit. As noted in his session proposal, some filesystems may be expecting to not be interruptible, but are calling code can take locks and mutexes that are interruptible (or killable), which are effectively changing the state of the task incorrectly. He would like to find a solution for that problem.
The interruptibility here refers to signal handling. An interruptible process will respond to any signals that are not masked or ignored. Killable is a variant of interruptible that will only respond to fatal signals.
-
LWN ☛ Tracing the source of filesystem errors
There are lots of places in the kernel where an EINVAL can be returned to user space, but it is often unclear what the actual underlying problem is because the errno error codes are too generic. That is the problem that Miklos Szeredi wanted to discuss in a filesystem session that he led remotely at the 2024 Linux Storage, Filesystem, Memory Management, and BPF Summit. He would like to help those who are trying to debug problems trace where in the kernel a particular error code is being generated.
Filesystem mounting is an example of where this problem can occur, Szeredi said; there are lots of places where EINVAL is returned, so it does not really tell anyone anything. If he is debugging a kernel filesystem and receives an error, he wants to know where in the code that occurred. The strace tool is useful for debugging, so ideally whatever is done to help show where errors are coming from would integrate with it.