news
News About BSD and ZFS in Particular
-
Klara ☛ How Klara and TrueNAS Collaborated to Fix ZFS Deduplication
In the world of enterprise storage, few technologies have reshaped architectures and capabilities as profoundly as ZFS. When ZFS was introduced over two decades ago, it rewrote the rulebook for data integrity, scalability, and storage pool management. Yet, for all its innovations, one capability consistently lagged expectations: deduplication at scale.
Deduplication, the process of identifying and eliminating extraneous copies of data, is not novel. It has been a staple of backup appliances and storage systems since the 90s. The original ZFS implementation, while functionally correct, was often impractical in real-world enterprise environments due to the performance and memory overheads. The amount of memory required was unpredictable, and if it exceeded the available memory, most write operations would be bottlenecked waiting for random reads from the dedup table, dropping performance off an unexpected cliff. That all changed through a focused collaboration between Klara and TrueNAS, resulting in what the community now calls ZFS Fast Dedup.
-
Mariusz Zaborski ☛ Teaching ZFS about time
ZFS is a robust file system, in large part thanks to its copy-on-write design. Problems can still show up: flaky cables, dying drives, or the occasional cosmic ray flipping a bit. For exactly those situations ZFS has a feature called scrub. Scrub walks every used block starting from the uberblock and compares the stored checksum against the data on disk.
The catch is that scrubs are expensive, especially when the pool holds petabytes. An admin who hit a power-supply problem or a sudden shutdown usually wants to scrub the data written around that event, not the entire pool. That is the goal here: teach ZFS something about wall-clock time.
Internally, ZFS only thinks in transaction group (TXG) numbers. Administrators, inconveniently, think in dates. A TXG is just a uint64_t that goes up with every transaction, and there has never been a clean way to map between the two.
-
Dan Langille ☛ newsyslog – telling it not to compress, for anything
In the above situation, ZFS is compressing the data on-the-fly. Then newsylog comes along and decompresses it at the file-contents level. ZFS then can’t do much about compressing the already-compressed data.
A goal: don’t do that. Just let ZFS compress the data. It will be much better at it.
-
Alessandro Segala ☛ Unlocking Encrypted ZFS Volumes with a Passkey
If you run servers with ZFS, especially in a homelab, you’ve probably had to make peace with an awkward trade-off around disk encryption. ZFS native encryption is great: it protects data at rest, the compression and dedup machinery still work above the encryption layer, and you can encrypt individual datasets. The hard part has never been enabling encryption: the hard part is figuring out where to keep the key.
There are two common approaches to managing the encryption key for ZFS datasets, and neither one is fully satisfying.