Back on December 13th, I posted a challenge on Mastodon: In a simple UTF-8 byte-driven finite automaton, how many states does it take to match the regular-expression construct “.”, i.e. “any character”? Commenter Anthony Williams responded, getting it almost right I think, but I found his description a little hard to understand. In this piece I’m going to dig into what . actually means, and then how many states you need to match it.

The answer surprised me. Obviously this is of interest only to the faction of people who are interested in automaton wrangling, problematic characters, and the finer points of UTF-8. I expect close attention from all 17 of you!