Programming Leftovers
-
Infuse your awk scripts with Groovy | Opensource.com
Recently I wrote a series on using Groovy scripts to clean up the tags in my music files. I developed a framework that recognized the structure of my music directory and used it to iterate over the content files. In the final article of that series, I separated this framework into a utility class that my scripts could use to process the content files.
This separate framework reminded me a lot of the way awk works. For those of you unfamiliar with awk, you might benefit from Opensource.com's eBook, A practical guide to learning awk.
I have used awk extensively since 1984, when our little company bought its first "real" computer, which ran System V Unix. For me, awk was a revelation: It had associative memory— think arrays indexed by strings instead of numbers. It had regular expressions built in, seemed designed to deal with data, especially in columns, and was compact and easy to learn. Finally, it was designed to work in Unix pipelines, reading its data from standard input or files and writing to output, with no ceremony required to do so—data just appeared in the input stream.
To say that awk has been an essential part of my day-to-day computing toolkit is an understatement. And yet there are a few things about how I use awk that leave me unsatisfied.
Probably the main issue is that awk is good at dealing with data presented in delimited fields but curiously not good at handling comma-separated-value files, which can have field delimiters embedded within a field, provided that the field is quoted. Also, regular expressions have moved on since awk was invented, and needing to remember two sets of regular expression syntax rules is not conducive to bug-free code. One set of such rules is bad enough.
-
Visit for a surprise – Eric Bailey
Could spoiling a joke be an accessibility issue? You better believe it.
-
How to Use the String join() Method in Python - Pi My Life Up
The join method makes it easy to combine multiple strings stored within an iterable such as tuples, lists, sets, and dictionaries. In addition, you can set characters to be used as a separator, such as a space or a dash.
There are many more methods that you can use to manipulate strings within Python. For example, you can use the split method to split a string based on a specified separator.
The tutorial below will touch on the syntax of the join method and the various iterables you can use with join. For example, we will touch on using a join with a list, dictionary, set, or tuple.
-
Update: jpegdump.py Version 0.0.10 | Didier Stevens
This update to jpegdump.py, my tool to analyze JPEG images, brings 2 small changes:
Data between segments can be selected with suffix d. Like this: -s 10d
This means: select the data between segments 9 and 10.
-
On interpolating stuff into pattern matches | Aristotle [blogs.perl.org]
Ironically, it’s qr objects which don’t get that benefit. On the machine I’m typing on, the following benchmark…
-
A proposal for capping exploding electricity spot market prices without subsidies or supply reduction | R-bloggers
At the EEX, German baseload electricity futures for the year 2023 trade at a price of 950 Euro / MWh and peak load futures at 1275 Euro / MWh. Future prices for France are even higher. (Prices were looked up on 2022-08-28).
-
Up and Running with R Markdown
-
23 New books added to Big Book of R | R-bloggers
Today we have another huge addition of books to the library, now consisting at 350 R programming books! Thanks to Gary and Abraham for the additions!
-
When is TinyML too much?
When it comes to machine learning (ML), sometimes every problem looks like it needs a neural net, when in fact it just needs some statistics. This is especially true when it comes to running algorithms on microcontrollers and for industrial use cases such as predictive maintenance, according to Bernard Burg, director of AI and data science at Infineon.
I focus a lot on using machine learning at the edge — specifically the idea of running machine learning models on microcontrollers, known as TinyML — because there are clear benefits for the IoT. By analyzing incoming data where that data is created, engineers can reduce latency, lower bandwidth costs, and increase privacy while also saving on energy consumption. But one doesn’t always need TinyML. Sometimes using linear regression or anomaly detection will do.
-
pkgdown and GDPR – How to host a pkgdown site in Germany | R-bloggers
pkgdown is a great tool for generating a website with documentation for an R package.
Unfortunately, pkgdown uses CDNs (content delivery networks) like Cloudflare to embed often used JavaScript libraries into the generated website.
-
How a Local Community Produced the First Nation-wide R useR Group | R-bloggers
The R Consortium recently interviewed Szilard Pafka with the Real Data Science USA R Group (formerly known as the Los Angeles R User Group). The former Los Angeles R useR Group had been based in Los Angeles for more than 10 years, but after organizer Szilard moved to Texas, he kept the group going and even expanded!
-
Sharing data between threads in PowerDNS Recursor | PowerDNS Blog
This is the third part of a series of blog posts we are publishing, mostly around recent developments with respect to PowerDNS Recursor. The first blog post was Refreshing Of Almost Expired Records: Keeping The Cache Hot, the second Probing DoT Support of Authoritative Servers: Just Try It.
In PowerDNS Recursor the actual resolving is done using mthreads: a lightweight cooperative thread switching mechanism. This allows us to write the resolving code in a straightforward manner, we can program it as if we are resolving in a synchronous way. The mthreads abstraction takes care of running another mthread when the resolving process for a particular query has to wait for incoming data from the network. Mthreads are mapped to Posix threads, the thread abstraction the C++ runtime provides. Typically a handful of Posix threads run many mthreads, one for each query in-progress. Mthread switching happens only at specific points in the resolving process, basically whenever I/O is done.
-
Gamma regression in Stata and statsmodels
Generalised linear models with a gamma distribution and log link are frequently used to model non-negative right-skewed continuous data, such as costs [1].