Programming Leftovers
-
James G ☛ Advent of Patterns Wrap-up
I have occasionally written about patterns in software and design this year. When reflecting on ideas for a potential Advent series this year, I felt patterns would be a good challenge. I started to take notes on some patterns I had observed in software and design: define once, reference everywhere; autosuggestions; pre-rendering data; time since article published disclosure boxes, and more.
-
Amos Wenger ☛ Catching up with async Rust
In December 2023, a minor miracle happened: async fn in traits shipped.
-
LWN ☛ Facing the Git commit-ID collision catastrophe
Commits in the Git source-code management system are identified by the SHA-1 hash of their contents — though the specific hash may change someday. The full hash is a 160-bit quantity, normally written as a 40-character hexadecimal string. While those strings are convenient for computers to work with, humans find them to be a bit unwieldy, so it is common to abbreviate the hash values to shorter strings. Geert Uytterhoeven recently proposed increasing the length of those abbreviated hashes as used in the kernel community, but the problem he was working to solve may not be as urgent as it seems.
A hash, of course, is not the same as the data it was calculated from; whenever hashes are used to represent data, there is always the possibility of a collision — when two distinct sets of data generate the same hash value. A 160-bit hash space is large enough that the risk of accidental collisions is essentially zero; the risk of intentional (malicious) collisions is higher, but is still not something that most people worry about — for now. The hash space is large enough that even a relatively small portion of the hash value is still enough to uniquely identify a value. In a small Git repository, a 24-bit (six-digit) hash may suffice; as a repository grows, the number of digits required to unambiguously identify a commit will grow. In all cases, though, the shorter commit IDs are much easier for humans to deal with, and are almost universally used.
-
LWN ☛ Emacs code completion can cause compromise
Emacs has had a few bugs related to accidentally permitting the execution of untrusted code. Unfortunately, it seems as though another bug of that sort has appeared — and may be harder to patch, because the problem comes from the way Emacs handles expansion of Lisp macros in code being analyzed. The vulnerability is only practically exploitable in a non-default configuration, so not every Emacs user has something to worry about. The Emacs developers are reportedly working on a fix, but have not yet shared details about it. In the meantime, every Emacs version since at least 26.1 (released in May 2018) through the current development version is vulnerable.
Eshel Yaron publicly disclosed the problem on November 27, although they reported it to the Emacs maintainers in August. The problem has two parts: expanding a macro in Emacs Lisp (Elisp) can run arbitrary code (including invoking a shell to run arbitrary commands), and common operations such as code-completion or jump-to-definition in Elisp files can require macro expansion. Since those operations are quite useful for reading and understanding code, many Emacs users have them enabled.
-
LWN ☛ Using Guile for Emacs
Emacs is, famously, an editor—perhaps far more—that is extensible using its own variant of the Lisp programming language, Emacs Lisp (or Elisp). This year's edition of EmacsConf, which is an annual "gathering" that has been held online for the past five years, had two separate talks on using a different variant of Lisp, Guile, for Emacs. Both projects would preserve Elisp compatibility, which is a must, but they would use Guile differently. The first talk we will cover was given by Robin Templeton, who described the relaunch of the Guile-Emacs project, which would replace the Elisp in Emacs with a compiler using Guile. A subsequent article will look at the other talk, which is about an Emacs clone written using Guile.
LWN looked at Guile-Emacs way back in 2014, when Templeton had completed the last of several Google Summer of Code (GSoC) internships working on it. Around that time, Templeton had a fully functional prototype, but they moved on to other things until recently reviving the project.
-
Python
-
Niels Provos ☛ Building a Generative AI Search Engine with PlanAI
PlanAI is an open-source Python framework that simplifies building complex AI workflows. In this tutorial, we’ll implement a generative AI search engine similar to Perplexity using PlanAI’s task-based architecture and integrations.
This tutorial is aimed at developers with a basic understanding of Python and general familiarity with AI concepts. We’ll be building a search engine that can answer complex questions by synthesizing information from multiple web sources. It’s “Perplexity-style” in that it provides a concise, AI-generated answer along with cited sources, much like the search engine Perplexity.ai. PlanAI makes building this type of application much easier by handling the complexities of task dependencies, data flow, caching, and integrating with various Large Language Models (LLMs). It even allows for human-in-the-loop input when automated methods fail, making it robust for real-world scenarios.
-
-
Shell/Bash/Zsh/Ksh
-
TecMint ☛ How to Use sed for Dynamic Number Replacement in Linux
This article will guide you through the basics of sed, explain how to use it for dynamic number replacement, and provide practical examples for beginners.
-
TecMint ☛ How to Use awk to Perform Arithmetic Operations in Loops
This article will guide you through using awk for arithmetic operations in loops, using simple examples to make the concepts clear.
-
-
Standards/Consortia
-
LWN ☛ Providing precise time over the network
Handling time in a networked environment is never easy. The Network Time Protocol (NTP) has been used to synchronize clocks across the internet for almost 40 years — but, as computers and networks get faster, the degree of synchronization it offers is not sufficient for some use cases. The Precision Time Protocol (PTP) attempts to provide more precise time synchronization, at the expense of requiring dedicated kernel and hardware support. The Linux kernel has supported PTP since 2011, but the protocol has recently seen increasing use in data centers. As PTP becomes more widespread, it may be useful to have an idea how it compares to NTP.
PTP has several different possible configurations (called profiles), but it generally works in the same way in all cases: the computers participating in the protocol automatically determine which of them has the most stable clock, that computer begins sending out time information, and the other clocks on the network determine the networking delay between them in order to compensate for the delay. The different profiles tweak the details of these parts of the protocol in order to perform well on different kinds of networks, including in data centers, telecom infrastructure, industrial and automotive networks, and performance venues.
-