Language Selection

English French German Italian Portuguese Spanish

Kernel: LWN's Latest and IO_uring Patches

Filed under
Linux
  • Resource limits in user namespaces

    User namespaces provide a number of interesting challenges for the kernel. They give a user the illusion of owning the system, but must still operate within the restrictions that apply outside of the namespace. Resource limits represent one type of restriction that, it seems, is proving too restrictive for some users. This patch set from Alexey Gladkov attempts to address the problem by way of a not-entirely-obvious approach.

    Consider the following use case, as stated in the patch series. Some user wants to run a service that is known not to fork within a container. As a way of constraining that service, the user sets the resource limit for the number of processes to one, explicitly preventing the process from forking. That limit is global, though, so if this user tries to run two containers with that service, the second one will exceed the limit and fail to start. As a result, our user becomes depressed and considers a career change to goat farming.

    Clearly, what is needed is a way to make at least some resource limits apply on per-container basis; then each container could run its service with the process limit set to one and everybody will be happy (except perhaps the goats).

  • Fast commits for ext4

    The Linux 5.10 release included a change that is expected to significantly increase the performance of the ext4 filesystem; it goes by the name "fast commits" and introduces a new, lighter-weight journaling method. Let us look into how the feature works, who can benefit from it, and when its use may be appropriate.

    Ext4 is a journaling filesystem, designed to ensure that filesystem structures appear consistent on disk at all times. A single filesystem operation (from the user's point of view) may require multiple changes in the filesystem, which will only be coherent after all of those changes are present on the disk. If a power failure or a system crash happens in the middle of those operations, corruption of the data and filesystem structure (including unrelated files) is possible. Journaling prevents corruption by maintaining a log of transactions in a separate journal on disk. In case of a power failure, the recovery procedure can replay the journal and restore the filesystem to a consistent state.

    The ext4 journal includes the metadata changes associated with an operation, but not necessarily the related data changes. Mount options can be used to select one of three journaling modes, as described in the ext4 kernel documentation. data=ordered, the default, causes ext4 to write all data before committing the associated metadata to the journal. It does not put the data itself into the journal. The data=journal option, instead, causes all data to be written to the journal before it is put into the main filesystem; as a side effect, it disables delayed allocation and direct-I/O support. Finally, data=writeback relaxes the constraints, allowing data to be written to the filesystem after the metadata has been committed to the journal.

    Another important ext4 feature is delayed allocation, where the filesystem defers the allocation of blocks on disk for data written by applications until that data is actually written to disk. The idea is to wait until the application finishes its operations on the file, then allocate the actual number of data blocks needed on the disk at once. This optimization limits unneeded operations related to short-lived, small files, batches large writes, and helps ensure that data space is allocated contiguously. On the other hand, the writing of data to disk might be delayed (with the default settings) by a minute or so. In the default data=ordered mode, where the journal entry is written only after flushing all pending data, delayed allocation might thus delay the writing of the journal. To assure data is actually written to disk, applications use the fsync() or fdatasync() system calls, causing the data (and the journal) to be written immediately.

  • MAINTAINERS truth and fiction

    Since the release of the 5.5 kernel in January 2020, there have been almost 87,000 patches from just short of 4,600 developers merged into the mainline repository. Reviewing all of those patches would be a tall order for even the most prolific of kernel developers, so decisions on patch acceptance are delegated to a long list of subsystem maintainers, each of whom takes partial or full responsibility for a specific portion of the kernel. These maintainers are documented in a file called, surprisingly, MAINTAINERS. But the MAINTAINERS file, too, must be maintained; how well does it reflect reality?

    The MAINTAINERS file doesn't exist just to give credit to maintainers; developers make use of it to know where to send patches. The get_maintainer.pl script automates this process by looking at the files modified by a patch and generating a list of email addresses to send it to. Given that misinformation in this file can send patches astray, one would expect it to be kept up-to-date. Recently, your editor received a suggestion from Jakub Kicinski that there may be insights to be gleaned from comparing MAINTAINERS entries against activity in the real world. A bit of Python bashing later, a new analysis script was born.

  • Experimental Patches Allow For New Ioctls To Be Built Over IO_uring

    IO_uring continues to be one of the most exciting technical innovations in the Linux kernel in recent years not only for more performant I/O but also opening up other doors for new Linux innovations. IO_uring has continued adding features since being mainlined in 2019 and now the newest proposed feature is the ability to build new ioctls / kernel interfaces atop IO_uring.

    The idea of supporting kernel ioctls over IO_uring has been brought up in the past and today lead IO_uring developer Jens Axboe sent out his initial patches. These initial patches are considered experimental and sent out as "request for comments" - they provide the infrastructure to provide a file private command type with IO_uring handling the passing of the arbitrary data.

More in Tux Machines

digiKam 7.7.0 is released

After three months of active maintenance and another bug triage, the digiKam team is proud to present version 7.7.0 of its open source digital photo manager. See below the list of most important features coming with this release. Read more

Dilution and Misuse of the "Linux" Brand

Samsung, Red Hat to Work on Linux Drivers for Future Tech

The metaverse is expected to uproot system design as we know it, and Samsung is one of many hardware vendors re-imagining data center infrastructure in preparation for a parallel 3D world. Samsung is working on new memory technologies that provide faster bandwidth inside hardware for data to travel between CPUs, storage and other computing resources. The company also announced it was partnering with Red Hat to ensure these technologies have Linux compatibility. Read more

today's howtos

  • How to install go1.19beta on Ubuntu 22.04 – NextGenTips

    In this tutorial, we are going to explore how to install go on Ubuntu 22.04 Golang is an open-source programming language that is easy to learn and use. It is built-in concurrency and has a robust standard library. It is reliable, builds fast, and efficient software that scales fast. Its concurrency mechanisms make it easy to write programs that get the most out of multicore and networked machines, while its novel-type systems enable flexible and modular program constructions. Go compiles quickly to machine code and has the convenience of garbage collection and the power of run-time reflection. In this guide, we are going to learn how to install golang 1.19beta on Ubuntu 22.04. Go 1.19beta1 is not yet released. There is so much work in progress with all the documentation.

  • molecule test: failed to connect to bus in systemd container - openQA bites

    Ansible Molecule is a project to help you test your ansible roles. I’m using molecule for automatically testing the ansible roles of geekoops.

  • How To Install MongoDB on AlmaLinux 9 - idroot

    In this tutorial, we will show you how to install MongoDB on AlmaLinux 9. For those of you who didn’t know, MongoDB is a high-performance, highly scalable document-oriented NoSQL database. Unlike in SQL databases where data is stored in rows and columns inside tables, in MongoDB, data is structured in JSON-like format inside records which are referred to as documents. The open-source attribute of MongoDB as a database software makes it an ideal candidate for almost any database-related project. This article assumes you have at least basic knowledge of Linux, know how to use the shell, and most importantly, you host your site on your own VPS. The installation is quite simple and assumes you are running in the root account, if not you may need to add ‘sudo‘ to the commands to get root privileges. I will show you the step-by-step installation of the MongoDB NoSQL database on AlmaLinux 9. You can follow the same instructions for CentOS and Rocky Linux.

  • An introduction (and how-to) to Plugin Loader for the Steam Deck. - Invidious
  • Self-host a Ghost Blog With Traefik

    Ghost is a very popular open-source content management system. Started as an alternative to WordPress and it went on to become an alternative to Substack by focusing on membership and newsletter. The creators of Ghost offer managed Pro hosting but it may not fit everyone's budget. Alternatively, you can self-host it on your own cloud servers. On Linux handbook, we already have a guide on deploying Ghost with Docker in a reverse proxy setup. Instead of Ngnix reverse proxy, you can also use another software called Traefik with Docker. It is a popular open-source cloud-native application proxy, API Gateway, Edge-router, and more. I use Traefik to secure my websites using an SSL certificate obtained from Let's Encrypt. Once deployed, Traefik can automatically manage your certificates and their renewals. In this tutorial, I'll share the necessary steps for deploying a Ghost blog with Docker and Traefik.