Programming Leftovers
Some notes (to myself) about formatting text in jq
These days I'm having to deal with a steadily increasing number of commands that either output JSON only or where JSON is their best output option, and I want to reformat some of that JSON to a more useful or more readable text-based format. The obvious tool to do this with is jq, at least for simple reformatting (I think there's some things that are too tangled for jq). However, every time I need to do this, I keep having to look up how to format text in jq. Jq has a very big manual and a lot of features, so here's some notes to my future self about this.
Kernel SHAP
Our last posts were on SHAP, one of the major ways to shed light into black-box Machine Learning models. SHAP values decompose predictions in a fair way into additive contributions from each feature. Decomposing many predictions and then analyzing the SHAP values gives a relatively quick and informative picture of the fitted model at hand.
In their 2017 paper on SHAP, Scott Lundberg and Su-In Lee presented Kernel SHAP, an algorithm to calculate SHAP values for any model with numeric predictions. Compared to Monte-Carlo sampling (e.g. implemented in R package “fastshap”), Kernel SHAP is much more efficient.
Bring Your Own Binary Packages with RSPM
Installing R packages from source can be a slow process. This is compounded by the challenge of making sure you have all the right system libraries and compilers installed. CRAN eases the burden on most desktop R users by providing pre-built binary packages for both Windows and MacOS, but Linux users (or anyone using a Linux-based environment like Docker) are still expected to build from source.
Highlights from rstudio::conf(2022)
July 25 – 28 2022 saw thousands of people attend rstudio::conf(2022) both in-person in Washington D.C. and virtually from all over the world, including a few of us from Jumping Rivers. Here’s a recap of the big news, and a few of our personal highlights from the conference!
Dirk Eddelbuettel: RcppArmadillo used by 1001 CRAN Packages
It is with a mix of pride and joy, but also some genuine astonishment and amazement, that we can share that the counter of reverse dependencies at CRAN for our RcppArmadillo package for R just crossed 1000 packages [1]:
Conrad actually posted this a few weeks ago, by my count we were then still a few packages shy. In any event, having crossed this marker this summer, either then or now, and after more than a dozen years of working on the package is a really nice moment. Google Scholar counts nearly 500 citations for our CSDA paper (also this vignette), and that ratio of nearly a citation for every two packages used is certainly impressive. We have had the pleasure of working with so many other researchers and scientists using RcppArmadillo. Its combination of performance (C++, after all, and heavily tuned) and ease-of-use (inspired by ‘another popular flavour for matrix computing’ that is however mostly interpreted) makes for a powerful package, and we are delighted to see it used so widely.
Top 7 Python Developer Tools
Believe it or not, today python is considered one of the most powerful programming languages, and it’s spreading at a mass level. We have witnessed a surge of Python developers in the past couple of years at a whopping rate of 27% YoY (Year on Year). Last year python marked 30 years of success and it is clearly a sign that it is going to disrupt the market in the upcoming few years.
p6steve: TRC Slides
Symbolism | Playing Perl 6␛b6xA Raku
On IRC deoac wished to know how to print the name of a variable. This question is ambiguous. To get the name of the container is easy.