Programming Leftovers
-
bnosac :: open analytical helpers - audio transcription with whisper from R
Last week, OpenAI released version 2 of an updated neural net called Whisper that approaches human level robustness and accuracy on speech recognition. You can now directly call from R a C/C++ inference engine which allow you to transcribe .wav audio files.
-
Fast base16 encoding - Daniel Lemire’s blog
Given binary data, we often need to encode it as ASCII text. Email and much of the web effectively works in this manner.
A popular format for this purpose is base64. With Muła, we showed that we could achieve excellent speed using vector instructions on commodity processors (2018, 2020). However, base64 is a bit tricky.
A much simpler format is just base16. E.g., you just transcribe each byte into two bytes representing the value in hexadecimal notation. Thus the byte value 1 becomes the two bytes ’01’. The byte value 255 becomes ‘FF’, and so forth. In other words, you use one byte (or one character) per ‘nibble’: a byte is made of two nibbles: the most-significant 4 bits and the least-significant 4 bits.
-
Always use feenableexcept() when doing floating point math
This is a refreshed & expanded copy of a very old page I hosted outside of this blog. I recently ran into “silent NaNs” again, and thought it might be a good idea to republish this advice here.
-
Summary of ROS-Industrial Conference 2022 | ROS-Industrial
The 10th edition of the ROS-Industrial Conference took place on December 15-16, 2022 in Stuttgart, Germany and remotely. During the conference, 55 participants present in Stuttgart and an online audience of more than 200 people attended 17 talks in six sessions. The goal of the conference was to show and discuss what currently is possible in the ROS2 ecosystem when it comes to industrial applications.
-
ROSCon 2022 Rewind
This October I was fortunate enough to attend ROSCon with fellow colleagues Jerry Tower and Michael Ripperger in beautiful Kyoto, Japan. By luck, it just so happened that the month-long trip I booked to Japan one year ago lined up with Japan's borders opening and the conference's location and dates. Now that I'm back in America and have my work and personal business back in order, I'd like to share with you my ROSCon 2022 experience.
With an attendance of approximately 800 ROS developers ranging from absolute beginners to seasoned industry and academia experts, there was something for everyone at ROSCon. The panels were particularly useful to better understand the current state of ROS, ROS2, future plans, and concerns of the community. I found the presentations about integrating CANopen with ROS 2 in addition to the development work on a ROS 2 simulator with the Unreal Engine 4 interesting as well.
-
The death of the line of death | Emily M. Stark
The line of death, as Eric Lawrence explained in a classic blog post, is the idea that an application should separate trustworthy UI from untrusted content. The typical example is in a web browser, where untrustworthy web content appears below the browser toolbar UI. Trustworthy content provided by the web browser must appear either in the browser toolbar, or anchored to it or overlapping it. If this separation is maintained, then untrusted content can’t spoof the trustworthy browser UI to trick or attack the user.
Though the line of death has been an axiom of browser security for years, it’s losing relevance in modern browsers, and fortunately being replaced by more effective patterns for some attacks.
The line of death principle is a bit antiquated. First of all, I’m not aware of any research to support that it’s effective. In fact I’m not aware of much research about it at all. There’s plenty of research and practical experience to show that phishing is effective, picture-in-picture attacks are effective, and security indicators in the URL bar are misunderstood. There’s also some research on operating system equivalents to the line of death (thanks to Stuart Schechter for the pointer). But I’m not aware of any research that focuses on the line of death concept in browsers specifically. For example, I’d like to see a study looking at whether users perceive a dialog anchored to the browser toolbar differently than an identical dialog shown by web content. (Please send me pointers!) In the absence of usability studies, my intuition is that the line of death is simply a foreign, incomprehensible idea to many, many browser users.
-
R en Buenos Aires in 2023: Compiling a list of Latin American R packages - R Consortium
The R Consortium caught up with Elio Campitelli, organizer of the R en Buenos Aires Group in Buenos Aires, Argentina, to talk about their experience leading a group with almost 1,000 members. Elio discusses their early exposure to programming, the group’s special interest in R and social sciences, and plans on building a compiled list of Latin American R packages in 2023.
-
Hillshade, colors and marginal plots with tidyterra (II) | One world
This is the second post of the series “Hillshade, colors and marginal plots with tidyterra”. In this post I would explore an approach for annotating marginal plots to a ggplot2 map of a SpatRaster, including information of the values by longitude and latitude. See the first post of the series here.
-
rOpenSci | rOpenSci News Digest, December 2022
We have recently started building HTML reference manuals for each package in the R-universe! For packages that have had an update in the past 3 weeks, the reference manual is now linked from the package homepage on R-universe.dev. All packages in the R-universe are rebuilt at least once per month, so soon all packages should have an online HTML manual. You can also find reference manuals for base-R packages.
-
What is R7? A New OOP System for R
This blog post aims to give a brief introduction to R7, a new R package for OOP in R. It’s not a tutorial on how to write code using R7 - the documentation provides great instructions for getting started if you’re already ready to start programming in R7.
-
Touching the 3rd Rail of Data Science: 'R or Python?' - Win Vector LLC
I’ve been seeing a lot of hot takes on if one should do data science in R or in Python. I’ll comment generally on the topic, and then add my own myopic gear-head micro benchmark.
I’ll jump in: If learning the language is the big step: then you are a beginner in the data science field. So the right choice is: work with others and use the tools they are most able to teach you.
After that there are other considerations: what/who are you working with or integrating with. If you are working with statisticians, likely they will want R. If you are working with software engineers, likely they will want Python. If you are actually adding value in terms of translating business needs, picking machine learning models, methods for organizing data, designing experiments, controlling for bias, reducing variance: then programming is the least of your worries.
-
Day 25: Rakudo 2022 Review - Raku Advent Calendar
In a year as eventful as 2022 was in the real world, it is a good idea to look back to see what one might have missed while life was messing with your (Raku) plans.
Rakudo saw about 1500 commits this year, about the same as the year before that. Many of these were bug fixes and performance improvements, which you would normally not notice. But there were also commits that actually added features to the Raku Programming Language. So it feels like a good idea to actually mention those more in depth.
So here goes! Unless otherwise noted, all of these changes are in language level 6.d, and available thanks to several Rakudo compiler releases during 2022.