Programming Leftovers
-
Thorsten Ball ☛ Exploring the c4... compiler?
This week I found myself digging through the code of c4, an implementation of C “in four functions”, by Robert Swierczek.
I remember coming across c4 when it was released ten years ago. It got me excited: hey, C in four functions, that means it’s easy to understand right?
-
Simon Willison ☛ Exploring Hacker News by mapping and analyzing 40 million posts and comments for fun
A real tour de force of data engineering. Wilson Lin fetched 40 million posts and comments from the Hacker News API (using Node.js with a custom multi-process worker pool) and then ran them all through the BGE-M3 embedding model using RunPod, which let him fire up ~150 GPU instances to get the whole run done in a few hours, using a custom RocksDB and Rust queue he built to save on Amazon SQS costs.
-
Rlang ☛ How to Check if a Column Contains a String in R
Whether you’re doing some data cleaning or exploring your dataset, checking if a column contains a specific string can be a crucial task. Today, I’ll show you how to do this using both str_detect() from the stringr package and base R methods. We’ll also tackle finding partial strings and counting occurrences. Let’s dive right in!
-
Wilson Lin ☛ Exploring Hacker News by mapping and analyzing 40 million posts and comments for fun
The above is a map of all Hacker News posts since its founding, laid semantically i.e. where there should be some relationship between positions and distances. I've been building it and some other interesting stuff over the past few weeks, to play around with text embeddings. Given that HN has a lot of interesting, curated content and exposes all its content programatically, I thought it'd be a fun place to start.
-
[Old] GitLab ☛ Use SSH keys to communicate with GitLab
Git is a distributed version control system, which means you can work locally, then share or push your changes to a server. In this case, the server you push to is GitLab.
GitLab uses the SSH protocol to securely communicate with Git. When you use SSH keys to authenticate to the GitLab remote server, you don’t need to supply your username and password each time.
-
Reason ☛ How I Learned About The Copyright Act's Statute of Limitations
Instead, the Court only addressed the remedial question. The Court found that damages are not limited to the three-year period before the lawsuit is filed. Rather, damages can stretch back to the initial infringement. Kagan concluded, "There is no time limit on monetary recovery. So a copyright owner possessing a timely claim for infringement is entitled to damages, no matter when the infringement occurred." To use my case as an example, if a blog posts is published in 2013 with a copyrighted photograph, a timely claim could be brought in 2023, and damages would be awarded for a full decade of infringement. (I still have no idea how to calculate damages for a blog post viewed about two dozen times over the course of a decade, but I digress.) In short, Plaintiffs can avail themselves of the discovery rule, but still seek damages dating back far longer than three-years.
-
[Repeat] Rlang ☛ Treemaps In R
A treemap consists of a set of rectangles which represent different categories in your data and whose size is defined by a numeric value associated with the respective category. For example, a treemap could illustrate the continents on Earth, sized according to their population. For a deeper analysis, treemaps can include nested rectangles, that is, categories within categories. In our example, within each continent rectangle, new rectangles could represent countries and their populations.
-
Python
-
Simon Willison ☛ uv pip install --exclude-newer example
A neat new feature of the uv pip install command is the --exclude-newer option, which can be used to avoid installing any package versions released after the specified date.
-
-
Shell/Bash/Zsh/Ksh
-
[Old] Benjamin Esham ☛ A Dream of Zsh
Last night I had the strange sensation that as I was dreaming, my conscious mind was also turning over a programming idea, one that still kind of made sense when I woke up: what if I were to write a static site generator… in zsh?
-
-
Standards/Consortia
-
[Old] Benjamin Esham ☛ Kibibytes are silly and we should all use them
And even if you don’t like these units, consider this: if we’re diligent enough for long enough about using them, then at some point we may be able to go back to kilobytes—and to have everyone agree on what that means.
-