Programming Leftovers
-
A dbplyr-based Address Matching Package
Matching address records from one table to another is a common and often repeated task. This is easy when address strings can be matched exactly, although not so easy when they cannot be matched exactly. An overarching issue is that an address string may be spelt (or misspelt) in multiple ways across multiple records. Despite this, we may want to know which records are likely to be same address in another table, even though these addresses do not share the exact same spelling.
-
MLOps with vetiver in Python and R: Answering your questions - RStudio
As a follow-up to last month’s MLOps with vetiver in Python and R webinar, we’d like to highlight and answer some of the great audience questions asked during the session. You can also check out the demo and slides on the webinar’s website.
-
Elizabeth Mattijsen: Don't fear the grepper! (2)
This blog post is a follow-up on Don't fear the grepper! (1), recommended to read first if you haven't already.
[...]
I was in fact not telling the entire truth. The grep subroutine / method will take just about anything as the argument to filter on (not just a piece of code), in a process called "smart-matching".
Smart-matching basically is a form of comparison of two objects that somehow decides whether there is a match or not. The most visible form of that is the ~~ infix operator, but that is basically just syntactic sugar for an underlying mechanism.
-
Tree search in Haskell
To use this, you provide two callback functions. $is_good checks whether the current item has the properties we were searching for. $children_of takes an item and returns its children in the tree.
[...]
I felt a little bit silly, because I wrote a book about lazy functional programming and yet somehow, it’s not the glue I reach for first when I need glue.
-
the sticky mark-bit algorithm -- wingolog
A funny post today; I gave an internal presentation at work recently describing the so-called "sticky mark bit" algorithm. I figured I might as well post it here, as a gift to you from your local garbage human.
Before diving in though, we start with some broad context about automatic memory management. The term mostly means "garbage collection" these days, but really it describes a component of a system that provides fresh memory for new objects and automatically reclaims memory for objects that won't be needed in the program's future. This stands in contrast to manual memory management, which relies on the programmer to free their objects.
Of course, automatic memory management ensures some valuable system-wide properties, like lack of use-after-free vulnerabilities. But also by enlarging the scope of the memory management system to include full object lifetimes, we gain some potential speed benefits, for example eliminating any cost for free, in the case of e.g. a semi-space collector.
[...]
Going a bit deeper, here we have some basic implementations of mark and sweep. Marking starts with the roots: edges from outside the automatically-managed heap indicating a set of initial live objects. You might get these by maintaining a stack of objects that are currently in use. Then it traces references from these roots to other objects, until there are no more references to trace. It will visit each live object exactly once, and so is O(n) in the number of live objects.
Sweeping requires the ability to iterate the heap. With the precondition here that collect is only ever called with an empty freelist, it will clear the mark bit from each live object it sees, and otherwise add newly-freed objects to the global freelist. Sweep is O(n) in total heap size, but some optimizations can amortize this cost.