Language Selection

English French German Italian Portuguese Spanish

Kernel: LWN Articles, PKRAM and More

Filed under
  • Clarifying memory management with page folios

    Memory management generally works at the level of pages, which typically contain 4,096 bytes but may be larger. The kernel, though, has extended the concept of pages to include compound pages, which are groups of contiguous single pages. That, in turn, has made the definition of what a "page" is a bit fuzzy. Matthew Wilcox has been working since last year on a concept called "page folios" which is meant to bring the picture back into focus; whether the memory-management community will accept it remains unclear, though.
    At the lowest level, pages are a concept implemented by the hardware; the tracking of memory and whether it is present in RAM or not is done at page granularity. Any given CPU architecture may offer a limited selection of page sizes, but one "base" page size must be chosen, and the most common choice remains 4,096 bytes — the same as it was when the first Linux kernels were released 30 years ago.

    The kernel, though, often has reason to work with memory in larger chunks. One example is the management of "huge pages" which, once again, are implemented by the hardware. The x86 architecture, for example, can work with 2MB huge pages, and there are performance advantages to using them where they are applicable. The kernel will also allocate groups of pages in other sizes, though, typically for DMA buffers or other uses where a set of physically contiguous pages is needed. This sort of grouping of pages is known as a "compound page" in the kernel.

  • Lockless patterns: more read-modify-write operations

    Last week's installment in this series on lockless patterns took a first look at the compare-and-swap (CAS) operation. CAS is a powerful tool that can be used to implement a number of lockless primitives. The next step is to look at other atomic read-modify-write operations that can be implemented on top of compare-and-swap.
    CAS-based primitives usually operate on int values. The Linux kernel uses atomic_t, a struct type that wraps int so that loads and stores are marked explicitly. For example, it is not possible to write x++; if x is an atomic_t. Instead one must write atomic_inc(&x);. All operations on atomic_t start with "atomic_".

  • Patching until the COWs come home (part 1)

    The kernel's memory-management subsystem is built upon many concepts, one of which is called "copy on write", or "COW". The idea behind COW is conceptually simple, but its details are tricky and its past is troublesome. Any change to its implementation can have unexpected consequences and cause subtle breakage for existing workloads. So it is somewhat surprising that last year we saw two major changes the kernel's COW code; less surprising is the fact that, both times, these changes had unexpected consequences and broke things. Some of the resulting problems are still not fixed today, almost ten months after the first change, while the original reason for the changes — a security vulnerability — is also not fully fixed. Read on for a description of COW, the vulnerability, and the initial fix; the concluding article in the series will describe the complications that arose thereafter.
    Copy on write is a standard mechanism for sharing a single instance of an object between processes in a situation where each process has the illusion of an independent, private copy of that object. Examples include memory pages shared between processes or data extents shared between files. To see how COW is used in the memory-management subsystem, consider what happens when a process calls fork(): the pages in that process's private memory areas should no longer be shared between the parent and child. But, instead of creating new copies of those pages for the child process during the fork() call, the kernel will simply map the parent's pages in the child's page tables. Importantly, the page-table entries in both parent and child are set as read-only (write-protected).

    If either process attempts to write to one of these pages, a page fault will occur, and the kernel's page-fault handler will create a new copy of the page, replacing the page-table entry (PTE) in the faulting process with a PTE that references the new page, but which allows the write to proceed. This action is often referred to as "breaking COW". If the other process then tries to write to that same page, another page fault will occur, as that process's PTE is still marked read-only. But now the page-fault handler will recognize that the page is no longer shared, so the PTE can just be made writable and the process can resume.

    The benefits of this scheme are lower memory consumption and a reduction of CPU time spent copying pages during fork() calls. Often the price of copying is never paid for many of the pages because the child might call exit() or exec() before either the parent or the child writes to those pages.

  • Stupid RCU Tricks: So rcutorture is Still Not Aggressive Enough For You? - Paul E. McKenney's Journal — LiveJournal

    A full rcutorture run will do about 20 kernel builds, which can take some tens of minutes or, on slower systems, well over an hour. This can be extremely annoying when you simply want to re-run the last test in order to obtain better failure statistics or to get more test time on a recent bug fix.

    The traditional rcutorture way of avoiding rebuilds is to optionally edit the qemu-cmd files for each scenario to be re-run, then manually invoke sh on each resulting file. The editing step allows you to avoid overwriting the previous run's console output, but may be omitted if you don't care about that console output or if you have already saved it off somewhere. This works, but is painstaking and error-prone.


    Although the approach of using ssh works reasonably well on a few tens of systems, if someone wanted to run rcutorture on thousands of systems, something else would likely be required. On the other hand, there are not that many sites where one would reasonably devote anywhere near that many systems to rcutorture. There might be downloading improvements at some point, most likely in the form of allowing a script to be provided to allow to use some site-specific optimized multi-system download utility. Both and might someday need a way to specify that only a subset of a prior run's scenarios be re-run, for example, to chase down a bug that occurred in only a few of those scenarios.

    And as mentioned earlier, perhaps a future version of will gracefully handle remote systems with varying numbers of CPUs or running actual tests on the system running the script.

  • "PKRAM" Revived For Preserving Memory Pages Across Kexec'ing Kernels - Phoronix

    Patches back in 2013 were proposed for "PRAM" as persistent over-kexec memory storage to allow saving of memory pages across kernel reboots via kexec or when hitting a new kernel via kexec. Nearly one year ago Oracle retook up the effort and sent out PKRAM as their "preserved-over-Kexec" RAM and now finally a second iteration of PKRAM has been published.

    PKRAM remains focused on providing the ability to save memory pages of the currently running kernel so that those pages can be restored after a kexecc into a new kernel.

More in Tux Machines

digiKam 7.7.0 is released

After three months of active maintenance and another bug triage, the digiKam team is proud to present version 7.7.0 of its open source digital photo manager. See below the list of most important features coming with this release. Read more

Dilution and Misuse of the "Linux" Brand

Samsung, Red Hat to Work on Linux Drivers for Future Tech

The metaverse is expected to uproot system design as we know it, and Samsung is one of many hardware vendors re-imagining data center infrastructure in preparation for a parallel 3D world. Samsung is working on new memory technologies that provide faster bandwidth inside hardware for data to travel between CPUs, storage and other computing resources. The company also announced it was partnering with Red Hat to ensure these technologies have Linux compatibility. Read more

today's howtos

  • How to install go1.19beta on Ubuntu 22.04 – NextGenTips

    In this tutorial, we are going to explore how to install go on Ubuntu 22.04 Golang is an open-source programming language that is easy to learn and use. It is built-in concurrency and has a robust standard library. It is reliable, builds fast, and efficient software that scales fast. Its concurrency mechanisms make it easy to write programs that get the most out of multicore and networked machines, while its novel-type systems enable flexible and modular program constructions. Go compiles quickly to machine code and has the convenience of garbage collection and the power of run-time reflection. In this guide, we are going to learn how to install golang 1.19beta on Ubuntu 22.04. Go 1.19beta1 is not yet released. There is so much work in progress with all the documentation.

  • molecule test: failed to connect to bus in systemd container - openQA bites

    Ansible Molecule is a project to help you test your ansible roles. I’m using molecule for automatically testing the ansible roles of geekoops.

  • How To Install MongoDB on AlmaLinux 9 - idroot

    In this tutorial, we will show you how to install MongoDB on AlmaLinux 9. For those of you who didn’t know, MongoDB is a high-performance, highly scalable document-oriented NoSQL database. Unlike in SQL databases where data is stored in rows and columns inside tables, in MongoDB, data is structured in JSON-like format inside records which are referred to as documents. The open-source attribute of MongoDB as a database software makes it an ideal candidate for almost any database-related project. This article assumes you have at least basic knowledge of Linux, know how to use the shell, and most importantly, you host your site on your own VPS. The installation is quite simple and assumes you are running in the root account, if not you may need to add ‘sudo‘ to the commands to get root privileges. I will show you the step-by-step installation of the MongoDB NoSQL database on AlmaLinux 9. You can follow the same instructions for CentOS and Rocky Linux.

  • An introduction (and how-to) to Plugin Loader for the Steam Deck. - Invidious
  • Self-host a Ghost Blog With Traefik

    Ghost is a very popular open-source content management system. Started as an alternative to WordPress and it went on to become an alternative to Substack by focusing on membership and newsletter. The creators of Ghost offer managed Pro hosting but it may not fit everyone's budget. Alternatively, you can self-host it on your own cloud servers. On Linux handbook, we already have a guide on deploying Ghost with Docker in a reverse proxy setup. Instead of Ngnix reverse proxy, you can also use another software called Traefik with Docker. It is a popular open-source cloud-native application proxy, API Gateway, Edge-router, and more. I use Traefik to secure my websites using an SSL certificate obtained from Let's Encrypt. Once deployed, Traefik can automatically manage your certificates and their renewals. In this tutorial, I'll share the necessary steps for deploying a Ghost blog with Docker and Traefik.