Programming Leftovers
-
Memory Safety in a Systems Programming Language Part 3
The first entry in this series shows how to use the new DIP1000 rules to have slices and pointers refer to the stack, all while being memory safe. The second entry in this series teaches about the ref storage class and how DIP1000 works with aggregate types (classes, structs, and unions).
So far the series has deliberately avoided templates and auto functions. This kept the first two posts simpler in that they did not have to deal with function attribute inference, which I have referred to as “attribute auto inference” in earlier posts. However, both auto functions and templates are very common in D code, so a series on DIP1000 can’t be complete without explaining how those features work with the language changes. Function attribute inference is our most important tool in avoiding so-called “attribute soup”, where a function is decorated with several attributes, which arguably decreases readability.
We will also dig deeper into unsafe code. The previous two posts in this series focused on the scope attribute, but this post is more focused on attributes and memory safety in general. Since DIP1000 is ultimately about memory safety, we can’t get around discussing those topics.
-
Lollipop chart
According to modern recommendations in data viz, lollipop charts are generally a better alternative to bar charts, as they reduce the visual distortion caused by the length of the bars, making it easier to compare the values. So, in the next versions of the ‘modEvA‘ and ‘fuzzySim‘ packages, functions that produce bar plots will instead (by default) produce lollipop charts, using the new ‘lollipop’ function which will be included in ‘modEvA‘. I know ‘ggplot2‘ produces great lollipop charts already, but I like to keep my package dependencies to a minimum, or else they become much harder to maintain… So here’s the new function: [...]
-
Combining R and Python with {reticulate} and Quarto
The R versus Python debate has been going on for as long as both languages have existed. I’m not one to takes sides – I think you need to use the best tool for the job. Sometimes R will be better. Sometimes Python will be better. But what happens if you need both languages in the same workflow? Do you need to choose? No, is the simple answer. You can use both. This blog post will show you how you can combine R and Python code in the same analysis using {reticulate} and output the results using Quarto.
-
Transcoding Unicode with AVX-512: AMD Zen 4 vs. Intel Ice Lake
Most systems today rely on Unicode strings. However, we have two popular Unicode formats: UTF-8 and UTF-16. We often need to convert from one format to the other. For example, you might have a database formatted with UTF-16, but you need to produce JSON documents using UTF-8. This conversion is often called ‘transcoding’.
In the last few years, we wrote a specialized library that process Unicode strings, with a focus on performance: the simdutf library. The library is used JavaScript runtimes (Node JS and bun).
-
Started developing automatic language translation
The trick in the script is that it forces certain strings to not be translated. In the above example, that is "EasyOS", "${VER}" and "http://from.here.com/subdir"
-
An introduction to DocArray, an open source AI library
DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, and so on. It allows deep-learning engineers to efficiently process, embed, search, store, recommend, and transfer multi-modal data with a Pythonic API. Starting in November of 2022, DocArray is open source and hosted by the Linux Foundation AI & Data initiative so that there’s a neutral home for building and supporting an open AI and data community. This is the start of a new day for DocArray.
In the ten months since DocArray’s first release, its developers at Jina AI have seen more and more adoption and contributions from the open source community. Today, DocArray powers hundreds of multimodal AI applications.