Programming Leftovers
-
Day 3: Santa and the Rakupod Wranglers - Raku Advent Calendar
Santa’s world was increasingly going high-tech, and his IT department was polishing off its new process that could take the millions of letters received from boys and girls around the world, scan them into digital form with state-of-the-art optical character recognition hardware, and produce outputs that could greatly streamline the Santa Corporation’s production for Christmas delivery.
One problem had initially stymied them, but consultants from the Raku community came to their aid. (As you may recall, IT had become primarily a Raku shop because of the power of the language for all their programming needs ranging from shop management to long-range planning.) The problem was converting the digital output from the OCR hardware to final PDF products for the factories and toy makers. The growing influence of Github and its Github-flavored Markdown format had resulted in IT’s post-OCR software converting the text into that format.
That was fine for initial use for production planning, but for archival purposes it lacked the capability to provide textual hints to create beautiful digital documents for permanent storage. The Raku consultants suggested converting the Markdown to Rakupod which has as much potential expressive, typesetting power as Donald Knuth’s TeX and its descendants (e.g., Leslie Lamport’s LaTex, ConTeXt, and XeTeX). As opposed to those formats, the Rakupod product is much easier to scan visually and, although current Raku products are in the early stages of development, the existing Rakupod-to-PDF process can be retroactively improved by modifying the existing Rakupod when future products are improved.
-
This Week in PSC (089) | Perl Steering Council [blogs.perl.org]
Back to the full three of us. Not much needed looking at this week.
-
Using Rust at a startup: A cautionary tale | by Matt Welsh | Nov, 2022 | Medium
Rust is awesome, for certain things. But think twice before picking it up for a startup that needs to move fast.
-
WebAssembly: Go vs Rust vs AssemblyScript :: Ecostack — a developer blog
Imagine you are working on the next big thing that runs in the browser, and it requires some heavy-duty code, which need to run fast and efficient. You remember that your friend Jack told you about WebAssembly (Wasm), which supposedly runs faster than JavaScript (JS), so you decide to check it out.
-
gbuild: LibreOffice build system - part 1 - LibreOffice Development Blog
LibreOffice uses a build system that has the name gbuild which works on the top of GNU Make. Migrating from the old build system to gbuild is started in the OpenOffice days, but the migration took a while, and a lot of efforts, and finished around LibreOffice 4.1.
This LibreOffice build system uses GNU Make, Perl and Python, so you need to have these prerequisites in order to be able to build LibreOffice.
-
.NET open source is 'heavily under-funded' says AWS [Ed: Microsoft media operative Tim Anderson is back to promoting Microsoft stuff at The Register]
Amazon web arm investing in Microsoft's platform to help customers escape Windows
-
Falsehoods programmers believe about undefined behavior
Undefined behavior (UB) is a tricky concept in programming languages and compilers. Over the many years I've been an industry mentor for MIT's 6.172 Performance Engineering course, An excellent class that I highly recommend. It's very thorough and hands-on, at the expense of also requiring a lot of work at a very fast pace. When I took it as an undergrad, that was a great tradeoff, but YMMV. I've heard many misconceptions about what the compiler guarantees in the presence of UB. This is unfortunate but not surprising!
For a primer on undefined behavior and why we can't just "define all the behaviors," I highly recommend Chandler Carruth's talk "Garbage In, Garbage Out: Arguing about Undefined Behavior with Nasal Demons."
You might also be familiar with my Compiler Adventures blog series on how compiler optimizations work. An upcoming episode is about implementing optimizations that take advantage of undefined behavior like dividing by zero, where we'll see UB "from the other side."
-
Cache invalidation really is one of the hardest problems in computer science - Surfing Complexity
My colleagues recently wrote a great post on the Netflix tech blog about a tough performance issue they wrestled with. They ultimately diagnosed the problem as false sharing, which is a performance problem that involves caching.
I’m going to take that post and write a simplified version of part of it here, as an exercise to help me understand what happened. After all, the best way to understand something is to try to explain it to someone else.
But note that the topic I’m writing about here is outside of my personal area of expertise, so caveat lector!
-
Recognizing patterns in memory // TimDbg
Something I find frustrating is how hard it is to teach debugging skills. I think the biggest reason is because there are many things that can only be learned through experience. This is true for anything that requires pattern recognition. Our brains are great at recognizing patterns, but it often takes a large amount of practice to be able to identify useful patterns in data.
I can’t instantly give you pattern recognition skills with a short blog post, but I can tell you about some of the patterns that I look for so you can start to train your brain to see these as well. Recognizing patterns in memory can be useful as it can give you a hint for things like memory corruption, which are often some of the hardest errors to debug from a postmortem analysis. Getting a rough idea of what type data is ovewriting other data in a process can tell you where to look next for the source of memory corruption. It can help narrow down where an issue might be because the bug is usually near the code that wrote this data.
-
Coping strategies for the serial project hoarder
I gave a talk at DjangoCon US 2022 in San Diego last month about productivity on personal projects, titled “Massively increase your productivity on personal projects with comprehensive documentation and automated tests”.
The alternative title for the talk was Coping strategies for the serial project hoarder.
I’m maintaining a lot of different projects at the moment. Somewhat unintuitively, the way I’m handling this is by scaling down techniques that I’ve seen working for large engineering teams spread out across multiple continents.
The key trick is to ensure that every project has comprehensive documentation and automated tests. This scales my productivity horizontally, by freeing me up from needing to remember all of the details of all of the different projects I’m working on at the same time.
-
Why writing by hand is still the best way to retain information
Picture this: it’s a work day at an enterprise payments processing company, and there is a critical data engineering task that needs to be completed. In this case, I’m the data engineer who needs to finish the task, but I am missing information necessary for my data model to be finished. I heard the information in a meeting. It was discussed in the daily standup. I have some vague typed notes, but I can’t recall the technical details I need to finish my work. No one is available to answer my question. It’s then that it hits me: I should have written down notes by hand during the meeting.
Writing notes by hand would have given me several different tangible resources that could help me find the critical missing information: a stronger memory of the meeting I was in, the gaps in the details of the discussion that occurred, and the notes themselves that would help me trigger a stronger recall of the events just by reviewing them on paper. Detailed typed notes would not help my recall and retention of the information in the meetings in the same way that notes written by hand would, though they would have been helpful.
-
I/O is no longer the bottleneck
When interviewing programmers, I often ask them to code a simple program to count word frequencies in a text file. It’s a good problem that tests a bunch of skills, and with some follow-up questions, allows you to go surprisingly deep.
One of the follow-up questions I ask is, “What’s the performance bottleneck in your program?” Most people say something like “reading from the input file”.
In fact, I was inspired to write this article after responding to someone on Gopher Slack, who said, “I also note there’s a lot of extra work happening here in splitting the entire line, etc, it’s just that typically this is all so much faster than I/O that we don’t care.”
I’m not picking on him … before I analyzed the performance of the count-words problem, I thought the same. It’s what we’ve all been taught, right? “I/O is slow.”
Not anymore! Disk I/O may have been slow 10 or 20 years ago, but in 2022, reading a file sequentially from disk is very fast.
-
Git Notes: Git's Coolest, Most Unloved Feature - Tyler Cipriani
the short of it is: they’re cool for appending notes from automated systems (like ticket or build systems) but not really for having interactive conversations with other developers (at least not yet)