news
Programming Leftovers
-
Vlad-Stefan Harbuz ☛ LLMs Are Accelerating the Open Source Sustainability Crisis
A major cause is that LLMs are blocking Tailwind’s creators and users from forming a relationship with each other.
Here’s how. There are at least two ways for users to learn how to use Tailwind: [...]
-
Vikash Patel ☛ Don't Take Out the Garbage - A Go GC Deep Dive
In the world of high-throughput backend services, we often obsess over the usual suspects of performance: database indexing, network latency, and algorithmic complexity. But recently, while debugging our core gateway service (backend-gw), I encountered a bottleneck that defied standard logic.
The service was CPU-bound, yet active heap usage was surprisingly low (~200MB). P99 latency was spiking at random intervals, but database queries were returning in milliseconds.
The culprit was not the business logic. It was memory management. I was effectively running a Denial-of-Service attack on my own runtime.
This post is a detailed breakdown of the Go Garbage Collector (GC). I’ll explain how the GC actually works, dissect its specific phases (including the dreaded “Stop The World”), and show you how to use the Go Trace tool to identify when your application is losing the battle against allocation churn.
-
Modus Create LLC ☛ The quest for grammar combinators: introducing the Pup library
Parser combinators are one of the prides of the Haskell community. They’re a craft that we continue to polish to this day. Yet, there’s something unsatisfactory about parser combinators.
See, when I write a parser, I frequently write a pretty-printer as well1, and the pretty-printer is almost the same as the parser. This makes maintenance harder, if only because parser and pretty-printer have to be kept in sync. Besides, it simply feels like unnecessary duplication.
This blog post is the story of the latest developments in the quest for more general grammar combinators—or as Mathieu Boespflug and I have been calling them, format descriptors—and how it led me to publish a new library, called Pup. For further reading, you can also check the paper that Mathieu and I wrote about it for Olivier Danvy’s festschrift, held at the ICFP/SPLASH conference in Singapore last October.
-
Leon Mika ☛ Too Much HTML
Being a backend developer, it’s sometimes nice to be given the option to do something different. Right now I need to make changes to a Vue frontend project to support some work I’m doing in the backend. And while working on the HTML template of a new component, a funny feeling came to me: “wait, is this too much HTML? Should I be abstracting this out into another component?”
-
Julia Evans ☛ A data model for Git (and other docs updates)
Hello! This past fall, I decided to take some time to work on Git’s documentation. I’ve been thinking about working on open source docs for a long time – usually if I think the documentation for something could be improved, I’ll write a blog post or a zine or something. But this time I wondered: could I instead make a few improvements to the official documentation?
So Marie and I made a few changes to the Git documentation!
-
Felix ☛ HTML parsers in Portland
If you ask an AI coding agent to translate program P into language Q, the agent might do something like: "P seems to be fizzbuzz. I'll implement fizzbuzz in Q." So you might get a Q implementation that's entirely different from the P implementation. This might be ok? It can be a problem if: [...]
-
Quarkslab ☛ Clang Hardening Cheat Sheet - Ten Years Later
Ten years ago, we published on this blog a Clang Hardening Cheat Sheet. The original post walked through essential hardening techniques available at the time, such as FORTIFY_SOURCE checks, ASLR via position-independent code, stack protection (canaries and safe stack), Control Flow Integrity (CFI), GOT protection with RELRO/now, but also options to activate warnings about string formatting that could lead to potential attacks.
Since that article was published in early 2016, both the threat landscape and the Clang toolchain have evolved significantly.
To celebrate the 10th anniversary of the initial article, here is a new cheat sheet with some new hardening flags to improve security.
-
Volodymyr Gubarkov ☛ How I program in AWK
Indeed, the language is really minimalistic, but it has just enough to fulfill certain kinds of projects.
-
Harmen Stoppels ☛ I/O is no longer the bottleneck?
Recently Ben Hoyt published a blog post claiming that contrary to popular belief, I/O is not the bottleneck in typical programming interview problems such as counting word frequencies from a stream. Sequential read speed has come a long way, while CPU speed has stagnated.
Sequential reads are indeed incredibly fast. Using the same method as linked in Ben Hoyt's post, I'm getting 1.6 GB/s sequential reads on a cold cache, and 12.8 GB/s on a warm cache (best of five).
But it should be possible to count word frequencies at a speed of 1.6 GB/s even on a single thread, right?
-
Harmen Stoppels ☛ I/O is no longer the bottleneck? (part 2)
My quest to count words faster than NVMe sequential read speed has come to a close. In my previous blog post I ended up with a rather unconvincing 1.45 GB/s throughput using AVX2 instructions, even though NVMe sequential read speed on a warm cache was 12.8 GB/s. It was unconvincing not only cause it was below NVMe throughput, but also because it wasn't even doing the job of counting word frequencies — it's only counting total words.
I'm happy someone showed me a related project: fastlwc. Of course someone has already implemented a truly fast word count. I tried to understand their ideas and learned about two great tricks. Then I implemented them myself, but in a simpler way, getting equivalent performance.
The result is very fast word count for AVX2 in just a handful lines of C.
-
Aatango ☛ We can clear a C++ std::string with std::exchange()
std::exchange() was introduced to the standard library with C++14 to replace the argument with a new value and return the previously held content. The trick shown above, to clear a value by "self-exchange" is not the intended usage of that utility function.
-
Rnb37 ☛ Software craftsmanship is dead
Somewhere along the way, in the midst of the agilification of software, or the software engineer salary gold rush, we forgot about craftsmanship.
I have been in big tech, startups, consultancies, and even government. These are all different environments with one key similarity: code quality is low, especially as of late.
-
Worst of Breed ☛ worstofbreed.net
Welcome to the premier destination for Resume-Driven Development, Over-Engineering, and Resume-Padding. Why build simple solutions when you can build a distributed monolith managed by 4 different committees?
-
Lea Verou ☛ Web dependencies are broken. Can we fix them?
However, bundling is not technically a necessary step of dependency management. Importing files through URLs is natively supported in every browser, via ESM imports. HTTP/2 makes importing multiple small files far more reasonable than it used to be — at least from a connection overhead perspective. You can totally get by without bundlers in a project that doesn’t use any libraries.
-
Dirk Eddelbuettel ☛ Dirk Eddelbuettel: RcppCCTZ 0.2.14 on CRAN: New Upstream, Small Edits
A new release 0.2.14 of RcppCCTZ is now on CRAN, in Debian and built for r2u.
RcppCCTZ uses Rcpp to bring CCTZ to R. CCTZ is a C++ library for translating between absolute and civil times using the rules of a time zone. In fact, it is two libraries. One for dealing with civil time: human-readable dates and times, and one for converting between between absolute and civil times via time zones. And while CCTZ is made by Google(rs), it is not an official Surveillance Giant Google product. The RcppCCTZ page has a few usage examples and details. This package was the first CRAN package to use CCTZ; by now several others packages (four the last time we counted) include its sources too. Not ideal, but beyond our control.
-
R / R-Script
-
Rlang ☛ rtopy: an R to Python bridge — novelties
The novelties mainly concern the RBridge class and the call_r function. The RBridge class is more about persistency, while the call_r function is more about ease of use.
-
Rlang ☛ Retrieval-Augmented Generation: Setting up a Knowledge Store in R
Happy New Year from the team at Jumping Rivers!
As we move through the midpoint of the 2020s, it’s a good time to
reflect on the changes that we have seen so far in this decade. -
Rlang ☛ R Studio or Positron? Time To Switch?
I remember the day that I started to use R programming. I had a basic interface to write and execute the code. After that experience, R Studio emerged as a powerful IDE for R programming for me.
-
Rlang ☛ Directional markers in R/leaflet
So you have used the excellent exiftool to extract all of the GPS-related information from a directory of photos in JPG format and write to a CSV file:
exiftool '-*GPS*' -ext jpg -csv . > outfile.csv
You’ve used R/leaflet to plot coordinates (latitude and longitude) before, but what about that tag named GPSImgDirection? It would be nice to have some kind of marker which indicates the direction in which you were facing when the photo was taken.
-
-
Java/Golang
-
Thibaut Rousseau ☛ The most popular Go dependency is…
Luckily for me, I came up with a second idea: the Go modules ecosystem relies on a centralized public proxy, so surely they expose some information on these modules. And they in fact do so! The proxy APIs are documented on proxy.golang.org: [...]
-