LWN on Kernel: KVM, blksnap, and Netdev 0x16 conference
-
Scaling the KVM community [LWN.net]
The scalability of Linus Torvalds was a recurring theme during Linux's early years; these days maintainer struggles are a recognized problem within open-source communities in general. It is thus not surprising that Sean Christopherson gave a talk at Open Source Summit Europe (and KVM Forum) with the title "Scaling KVM and its community". The talk mostly focused on KVM for the x86 architecture—the largest and most mature KVM architecture—which Christopherson co-maintains. But it was not a technical talk: most of the content can be applied to other KVM architectures, or even other Linux subsystems, so that they can avoid making the same kinds of mistakes.
-
Block-device snapshots with blksnap [LWN.net]
As a general rule, one need not have worked in the technology industry for long before the value of good data backups becomes clear. Creating a backup that is truly good, though, can be a challenge if the filesystem in question is actively being changed while the backup process runs. Over the years, various ways of addressing this problem have been developed, ranging from simply shutting down the system while backups run to a variety of snapshotting mechanisms. The kernel may be about to get another approach to snapshots should the blksnap patch set from Sergei Shtepa find its way into the mainline.
The blksnap patches are rigorously undocumented, so much of what follows comes from reverse-engineering the code. Blksnap performs snapshotting at the block-device level, meaning that it is entirely transparent to any filesystems that may be stored on the devices in question. It is able to create snapshots of a set of multiple block devices, so it should be suitable for RAID arrays and such. The targeted use case appears to be automated backup systems; the snapshots that blksnap creates are described as "non-persistent" and are meant to be discarded once a real backup has been made.
Since blksnap works at the block level, it must be given space to store snapshots that is separate from the devices being snapshotted. Specifically, there are ioctl() operations to assign ranges of sectors on a separate device for the storage of "difference blocks" and to change those assignments over time. There is a notification mechanism whereby a user-space process can be told when a given difference area is running low on space so that it can assign more blocks to that area.
The algorithm used by blksnap is simple enough: once a snapshot has been created for a set of block devices (using another ioctl() operation), blksnap will intercept every block-write operation to those devices. If a given block is being written to for the first time after the snapshot was taken, the previous contents of that block will be copied to the difference area, and a note will be made that the block has been changed since the snapshot was created. Once that is done, the write operation can continue normally. The block devices thus always reflect the most recent writes, while the difference area contains the older data needed to recreate the state of those devices at the time the snapshot was created.
-
Networking and high-frequency trading
The high-frequency-trading (HFT) industry is rather tight-lipped about what it does and how it does it, but PJ Waskiewicz of Jump Trading came to the Netdev 0x16 conference to try to demystify some of that, especially with respect to its use of networking. He wanted to contrast the needs of HFT with those of the traditional networking as it is used outside of the HFT space. He also has some thoughts on what the Linux kernel could do to help address those needs so that HFT companies could move away from some of the custom code that is currently being developed and maintained by multiple firms in the industry.