Back End/Databases Leftovers
-
Drew Breunig ☛ Conflating Overture Places Using DuckDB, Ollama, Embeddings, and More | Drew Breunig
It sounds simple enough, but datasets describe features inconsistently and are often riddled with errors. Conflation processes are complicated affairs with many stages, conditionals, and comparison methods. Even then, humans might be needed to review and solve the most stubborn joins.
Today we’re going to demonstrate a few different conflation methods, to illustrate the problem. The following is written for people with data processing or analysis experience, but little geospatial exposure. We’re currently experiencing a bit of a golden age in tooling and data for the geo-curious. It’s now possible to quickly assemble geo data and analyze it on your laptop, using freely available and easy-to-use tools. No complicated staging, no specialized databases, and (at least today) no map projection wrangling.
-
[Old] BoringSQL ☛ We need to talk about ENUMs
Designing a database schema, whether for a new application or a new feature, always raises a lot of questions. The choices you make can have a big impact on how well your database performs and how easy it is to maintain and scale. Whether you’re just getting started with PostgreSQL or consider yourself a seasoned pro, it’s easy to rely on old habits or outdated advice. In this article, I want to take a fresh look at one of those topics that often sparks debate: the use of ENUMs in PostgreSQL.
-
[Old] BoringSQL ☛ How not to change PostgreSQL column type | boringSQL
One of the surprises that comes with developing applications and operating a database cluster behind them is the discrepancy between practice and theory, development environment and the production. A perfect example of such a mismatch is changing a column type.
-
Robert Haas ☛ PostgreSQL Hacking Workshop - October 2024
This month, I'll be hosting a discussion of Thomas Munro's 2024.pgconf.dev talk, Streaming I/O and vectored I/O. As usual, there will be three sessions, and you can use this form to sign up for the session you prefer. However, if you do want to attend, please sign up right away, because our first session is scheduled for this Thursday.
-
Phil Eaton ☛ Build a serverless ACID database with this one neat trick (atomic PutIfAbsent)
Thanks to its simplicity, in this post we'll implement a Delta Lake-inspired serverless ACID database in 500 lines of Go code with zero dependencies. It will support creating tables, inserting rows into a table, and scanning all rows in a table. All while allowing concurrent readers and writers and achieving snapshot isolation.
There are other critical parts of Delta Lake we'll ignore: updating rows, deleting rows, checkpointing the transaction metadata log, compaction, and probably much more I'm not aware of. We must start somewhere.