news
Programming Leftovers
-
Scoop News Group ☛ Can Zero Trust survive the AI era?
Zero Trust – a set of security principles with roots in older cybersecurity concepts like “least privilege access” — essentially argues that defenders should treat everything on their network as a potential compromised asset. Thus, everything requires constant verification of identity, access, and authorization to protect from hackers, data breaches and insider threats.
-
Logikal Solutions ☛ Not All Movement is Forward
As more and more Open-Source projects get developed by for-profit companies run by MBAs we get the dual mortal sins of Agile and movement.
-
Andrew Nesbitt ☛ The Fragmented World of Dependency Policy
I’ve been thinking about adding policy features to git-pkgs/actions, the GitHub Actions that check licenses, scan for vulnerabilities, and generate SBOMs during CI. The license action currently takes a comma-separated list of SPDX identifiers and the vulnerability action takes a severity string, which is fine for simple cases but obviously not enough once you need to ignore specific CVEs with expiry dates, ban particular packages regardless of license, allow exceptions for vetted transitive dependencies, or set different rules for different repositories.
I went looking for a format to adopt rather than invent. I’ve also been investigating what it would take to add dependency intelligence features to Forgejo, the forge that Codeberg and a growing number of self-hosted instances run, and if Forgejo gets a dependency graph it will need a policy layer with the same questions about licenses and vulnerabilities and banned packages. Building two tools against the same policy format was the goal, but that required finding one worth using.
I found about forty tools that make automated policy decisions about dependencies, and every single one has its own format.
-
Rlang ☛ Reproducible Analytical Pipelines
It is becoming more common for this kind of data processing to be handled by a Reproducible Analytical Pipeline (RAP). A RAP is a, largely, automated process written in code. An aim of using RAPs here, is to reduce the amount of manual and ad-hoc input into the data processing, so that when given the same input data you would generate the same downstream products and so that the process should work successfully and predictably when given new data. By placing the processing decisions in code, RAPs make data processing more easily auditable and more transparent.
-
[Old] Haskell For All ☛ Data is Code
However, you can also go to the exact opposite extreme: "Data is Code"! You can make everything into code and implement data structures in terms of code.
You might wonder what that even means: how can you write any code if you don't have any primitive data structures to operate on? Fascinatingly, Alonzo Church discovered a long time ago that if you have the ability to define functions you have a complete programming language. "Church encoding" is the technique named after his insight that you could transform data structures into functions.
This post is partly a Church encoding tutorial and partly an announcement for my newly released annah compiler which implements the Church encoding of data types. Many of the examples in this post are valid annah code that you can play with. Also, to be totally pedantic annah implements Boehm-Berarducci encoding which you can think of as the typed version of Church encoding.
-
Perl / Raku
-
R / R-Script
-
Rlang ☛ Major Update for revss Package for R – v3.1.0
This is a big one! The revss package for R, which provides robust estimation for small samples, received a major, breaking update. The entire calculation engine was rewritten, new functionality added, and massive Monte Carlo analyses were run to calculate bias reduction factors. It should be at version 3.1.0, even though CRAN is showing 3.0.0. What happened and why? Where to even start?
-
Rlang ☛ Getting to the bottom of TMLE: targeting in action
In the previous post, I worked my way through some key elements of TMLE theory as I try to understand how it all works. At its essence, TMLE is focused on getting the efficient influence function (EIF) to behave properly. When that happens, the estimator of the target parameter behaves as if it were based on a random sample from the true data-generating distribution.
-
-
Python
-
Jussi Pakkanen ☛ Simple sort implementations vs production quality ones
One of the most optimized algorithms in any standard library is sorting. It is used everywhere so it must be fast. Thousands upon thousands of developer hours have been sunk into inventing new algorithms and making sort implementations faster. Pystd has a different design philosophy where fast compilation times and readability of the implementation have higher priority than absolute performance. Perf still very much matters, it has to be fast, but not at the cost of 10x compilation time.
This leads to the natural question of how much slower such an implementation would be compared to a production quality one. Could it even be faster? (Spoilers: no) The only way to find out is to run performance benchmarks on actual code.
To keep things simple there is only one test set, sorting 10'000'000 consecutive 64 bit integers that have been shuffled to a random order which is the same for all algorithms. This is not an exhaustive test by any means but you have to start somewhere. All tests used GCC 15.2 using -O2 optimization. Pystd code was not thoroughly hand optimized, I only fixed (some of the) obvious hotspots.
-
University of Toronto ☛ One problem with (Python) docstrings is that they're local
When I wrote about documenting my Django forms, I said that I knew I didn't want to put my documentation in docstrings, because I'd written some in the past and then not read it this time around. One of the reasons for that is that Python docstrings have to be attached to functions, or more generally, Python docstrings have to be scattered through your code. The corollary to this is that to find relevant docstrings you have to read through your code and then remember which bits of it are relevant to what you're wondering about.
-
Alcides Fonseca ☛ Z3 Python in the Browser in 10 minutes by Alcides Fonseca
But there was an issue: aeon is written in Python and relies on the z3 bindings that contain C++ code. We can run Python code in the browser with Pyodide, but the native libraries are not directly supported (at least this one, that relies on multi-threading).
-
Eric Matthes ☛ How many digits are there in pi?
Many of us who marvel at the wonder of mathematical constants just enjoyed another pi day. This year's celebration had me thinking about how we decide how many digits of pi to use whenever we're working with that number.
Without checking, and without reading ahead, what's your answer to this question: [...]
-
-
Java/Golang
-
Hanno Embregts ☛ Java 26 Is Here, And With It a Solid Foundation for the Future
Java 26 is here! Six months ago, we welcomed Java 25 into our hearts, which means it’s time for another fresh helping of Java features. This time, the set of features is a bit smaller compared to some of the previous releases, which can only mean one thing: the focus for this release was to provide a solid foundation for something big to be released soon™️! My hope is that the first JEPs out of Project Valhalla will be announced later this year. That hope is fueled by some of Java 26’s changes as they feel like appropriate preparation steps for the first Valhalla features (this is especially true for JEPs 500 and 529).
Regardless of any future plans, this post focuses on everything that has been added in this release, giving you a brief introduction to each of the features. Where applicable the differences with Java 25 are highlighted and a few typical use cases are provided, so that you’ll be more than ready to start using these features after reading this.
-