news
Databases: PostgreSQL and duckdb
-
PostgreSQL ☛ New PostgreSQL Contributors 2025
The PostgreSQL Contributors Page includes people who have made substantial, long-term contributions of time and effort to the PostgreSQL project.
The PostgreSQL Contributors Team is pleased to recognize 7 more individuals for their work in the community and on the codebase.
New PostgreSQL Contributors: [...]
-
Dirk Eddelbuettel ☛ ML quacks: Combining duckdb and mlpack
A side project I have been working on a little since last winter and which explores extending duckdb with mlpack is now public at the duckdb-mlpack repo.
duckdb is an excellent ‘small’ (as in ‘runs as a self-contained binary’) database engine with both a focus on analytical payloads (OLAP rather than OLTP) and an impressive number of already bolted-on extensions (for example for cloud data access) delivered as a single-build C++ executable (or of course as a library used from other front-ends). mlpack is an excellent C++ library containing many/most machine learning algorithms, also built in a self-contained manner (or library) making it possible to build compact yet powerful binaries, or to embed (as opposed to other ML framework accessed from powerful but not lightweight run-times such as Python or R). The compact build aspect as well as the common build tools (C++, cmake) make these two a natural candidate for combining them. Moreover, duckdb is a champion of data access, management and control—and the complementary machine learning insights and predictions offered by mlpack are fully complementary and hence fit this rather well.
[...]
duckdb-mlpack is right an “MVP”, i.e. a minimally viable product (or demo). It just runs the adaboost classifier but does so on any dataset fitting the ‘rectangular’ setup with columns of features (real valued) and a final column (integer valued) of labels. I had hope to use two select queries for both features and then labels but it turns a ‘table’ function (returning a table of data from a query) can only run one select *. So the basic demo, also on the repo README is now to run the following script (where the SELECT * FROM mlpack_adaboost((SELECT * FROM D)); is the key invocation of the added functionality): [...]