news
LWN Articles on Kernel
-
LWN ☛ Fending off unwanted file descriptors
One of the more obscure features provided by Unix-domain sockets is the ability to pass a file descriptor from one process to another. This feature is often used to provide access to a specific file or network connection to a process running in a relatively unprivileged context. But what if the recipient doesn't want a new file descriptor? A feature added for the 6.16 release makes it possible to refuse that offer.
Normally, a Unix-domain connection is established between two processes to allow the exchange of data. There is, however, a special option (SCM_RIGHTS, documented in unix(7)) to the sendmsg() system call that accepts a file descriptor as input. That descriptor will be duplicated and installed into the receiving process, giving the recipient access to the file as if it had opened it directly. SCM_RIGHTS messages can be used to give a process access to files that would otherwise be unavailable to it. It is also useful for network-service dispatchers, which can hand off incoming connections to worker processes.
-
LWN ☛ Slowing the flow of core-dump-related CVEs
The 6.16 kernel will include a number of changes to how the kernel handles the processing of core dumps for crashed processes. Christian Brauner explained his reasons for doing this work as: ""Because I'm a clown and also I had it with all the CVEs because we provide a **** API for userspace"". The handling of core dumps has indeed been a constant source of vulnerabilities; with luck, the 6.16 work will result in rather fewer of them in the future.
[...]
A core dump is an image of a process's data areas — everything except the executable text; it can be used to investigate the cause of a crash by examining a process's state at the time things went wrong. Once upon a time, Unix systems would routinely place a core dump into a file called core in the current working directory when a program crashed. The main effects of this practice were to inspire system administrators worldwide to remove core files daily via cron jobs, and to make it hazardous to use the name core for anything you wanted to keep. Linux systems can still create core files, but are usually configured not to.
An alternative that is used on some systems is to have the kernel launch a process to read the core dump from a crashing process and, presumably, do something useful with it. This behavior is configured by writing an appropriate string to the core_pattern sysctl knob. A number of distributors use this mechanism to set up core-dump handlers that phone home to report crashes so that the guilty programs can, hopefully, be fixed.
-
LWN ☛ The second half of the 6.16 merge window
The 6.16 merge window closed on June 8, as expected, containing 12,899 non-merge commits. This is slightly more than the 6.15 merge window, but well in line with expectations. 7,353 of those were merged after the summary of the first half of the merge window was written. More detailed statistics can be found in the LWN kernel source database.
-
LWN ☛ An end to uniprocessor configurations
The Linux kernel famously scales from the smallest of systems to massive servers with thousands of CPUs. It was not always that way, though; the initial version of the kernel could only manage a single processor. That limitation was lifted, obviously, but single-processor machines have always been treated specially in the scheduler. That longstanding situation may soon come to an end, though, if this patch series from Ingo Molnar makes it upstream.
Initially, Linus Torvalds's goal with Linux was simply to get something working; he did not have much time to spare for hardware that he did not personally have. And he had no multiprocessor machine back then — almost nobody did. So, not only did the initial version of the kernel go out with no SMP support, the kernel lacked that support for some years. The 1.0 and 1.2 releases of the kernel, which came out in 1994 and 1995, respectively, only supported uniprocessor machines.
-
LWN ☛ Zero-copy for FUSE
In a combined storage and filesystem session at the 2025 Linux Storage, Filesystem, Memory Management, and BPF Summit (LSFMM+BPF), Keith Busch led a discussion about zero-copy operations for the Filesystem in Userspace (FUSE) subsystem. The session was proposed by his colleague, David Wei, who could not make it to the summit, so Busch filled in, though he noted that ""I do not really know FUSE so well"". The idea is to eliminate data copies in the data path to and from the FUSE server in user space.
Busch began with some background on io_uring. When an application using io_uring needs to do read and write operations on its buffers, the kernel encapsulates those buffers twice, first into an iov_iter (of type ITER_UBUF) and from that into a bio_vec, which describes the parts of a block-I/O request. It does that for every such operation; ""if you are using the same buffer, that's kind of costly and unnecessary"". So io_uring added a way for applications to register a buffer; the kernel will create an iov_iter with the ITER_BVEC type just once when a buffer is registered. Then the application can use the io_uring "fixed" read/write operations, which will use what the kernel created rather than recreating it on each call.
-
LWN ☛ Improving iov_iter
The iov_iter interface is used to describe and iterate through buffers in the kernel. David Howells led a combined storage and filesystem session at the 2025 Linux Storage, Filesystem, Memory Management, and BPF Summit (LSFMM+BPF) to discuss ways to improve iov_iter. His topic proposal listed a few different ideas including replacing some iov_iter types and possibly allowing mixed types in chains of iov_iter entries; he would like to make the interface itself and the uses of iov_iter in the kernel better.
Howells began with an overview. An iov_iter is a stateful description of a buffer, which can be used for I/O; it stores a position within the buffer that can be moved around. There is a set of operations that is part of the API, which includes copying data into or out of the buffer, getting a list of the pages that are part of the buffer, and getting its length. There are multiple types of iov_iter. The initial ones were for user-space buffers, with ITER_IOVEC for the arguments to readv() and writev() and ITER_UBUF for a special case where the number of iovec entries (iovcnt) is one.
There are also three iov_iter types for describing page fragments: ITER_BVEC, which is a list of page, offset, and length; ITER_FOLIOQ, which describes folios and is used by filesystems; and ITER_XARRAY, which is deprecated and describes pages that are stored in an XArray. The problem with ITER_XARRAY is that it requires taking the read-copy-update (RCU) read lock inside iteration operations, which means there are places where it cannot be used, he said. An ITER_KVEC is a list of virtual kernel address ranges as with regions allocated with kmalloc(). Finally, the ITER_DISCARD type is used to simply discard the next N bytes without doing any copying, for example on a socket.