Performance of Solaris' ZFS LZ4 Compression
jomasoftmarcel.blogspot.comPlease note: Oracle does not implement the same version of ZFS as everyone else does. Sun chose the OpenZFS project as the steward of ZFS, and Oracle chose to never integrate OpenZFS upstream into their version of Solaris (which it itself is also an incompatible fork of the actual Solaris steward project, Illumos née OpenSolaris).
Since OpenZFS already implements LZ4 compression (and has so for quite some time), this is yet another feature that, once enabled, will stop you from importing your incompatible pool into anything that actually implements ZFS.
That's not really accurate phrasing, Sun had no involvement in OpenZFS because it started after Sun no longer existed. Sun maintained OpenSolaris which served as the de facto implementation of ZFS. OpenZFS was started in response to Oracle discontinuing OpenSolaris and only doing further ZFS in private.
You are correct about incompatible features. Sun and Oracle use a monotonically increasing integer to note new ZFS versions. OpenZFS instead incremented the version to 5000 and now uses feature flags so it is possible to coordinate individual feature enablement between all the operating systems that support OpenZFS.
While I am a fond user of an OpenZFS-derived implementation, OpenZFS postdates Sun's last gasps of existence by several years.
[1] has the OpenZFS launch announcement in September 2013, [2] dates Sun's acquisition to January 2010, [3] has the last OpenSolaris derived bits coming out of Sun in November 2010.
[1] - http://open-zfs.org/wiki/Announcement
[2] - https://www.cnet.com/news/oracle-buys-sun-becomes-hardware-c...
[3] - https://en.wikipedia.org/wiki/OpenSolaris (I'd cite osol-discuss, but that mailing list was shut down with the rest of sun.com)
I think its a bit much to pretend that oracle somehow doesn't have "real" ZFS and solaris, even if you dont like what they have done with them and they are incompatable.
Meh. Whatever one calls "real", "incompatible with everyone else" was the real point, and it's a strong one.
This really is too brief a study (although it's obviously fine for someone to write a quick blog-post about whatever they want).
Most importantly, how fast is the disk? I suspect (but would benchmark if I really needed to know) that the effects of compressions will be greatly different on an older 7,200 rpm spinning disk, vs a modern SSD.
It's a very good question because his copies are stupidly slow. Only 15MB/s. You could probably compress that in real time on a raspberry Pi!
It's a very poor test.
I remember SandForce-based SSDs used to do compression in the disk firmware, are there any current SSDs that do the same?
Most people say lz4+ZFS is a net win and you should usually enable it by default.
The big "gap" is probably between lz4 and gzip. e.g., for compressing logs, where gzip compresses a lot more but is terribly slow.
I hope zstd could be used for this case someday: http://facebook.github.io/zstd/
I imagine zstd's license will hamper its corporate adoption, especially among the big players.
https://github.com/facebook/zstd/blob/dev/PATENTSThe license granted hereunder will terminate, automatically and without notice, if you (or any of your subsidiaries, corporate affiliates or agents) initiate directly or indirectly, or take a direct financial interest in, any Patent Assertion: (i) against Facebook or any of its subsidiaries or corporate affiliates...Intel's QuickAssist is about to be standard on Xeon E5 chipsets and can do very high scale gzip at the cost of a PCIe round trip. Intel published some patches to ZoL for this.
It must be fine on a small test system, with CPU idling, etc.
I've worked with a few "ZFS appliances" from Sun (256-512TB range, NFS/iSCSI shares, 1-2k clients) and would never enable any advanced features on those (compression, dedup, etc). They were awfully unstable when we did that.
Granted, that was 5 years ago but I don't see any indication this technology has evolved significantly with all the drama surrounding Oracle, licensing, forks, etc. Just not worth the trouble these days, IMHO.
Conpression is fine. Dedup has always been the problem because it was rushed.
I don't think it's dedup being "rushed" that's a problem - implementing dedup is often done "offline" (like with NTFS's implementation, or btrfs), so the data gets written as unique at first, and then eventually something runs through, finds duplicates, and rewrites history to point all the duplicate instances to one copy.
But ZFS deeply hardcodes assumptions which mean you don't get to rewrite history like that, so it gets to do it synchronously (and keep all the ever-growing data structures required for this in memory for all writing).
I don't think an arbitrarily larger amount of time or money behind it would have permitted a better implementation, short of a ZFS2 and an in-place migration tool.
Dedup was rushed. This is supported by the original authors. I believe it was discussed in detail in a presentation by Matthew Ahrens. If I could remember the specific source I would link it, but it did not get the same level of testing and care as other features.
Dedup can be done right if the system has enough ram.
I don't know much about ZFS' deduplication, just heard that it requires a lot of memory, in a "hard minimum amount" way, to do it. This suggests, to me, that at least one design element of their deduplication engine is poor.
Efficient deduplication is design-wise a rather difficult problem with many trade-offs and issues which can blow your lower torso clean off when done wrong.
I don't think there is a system (beyond sheer coincidence, which seems rather unlikely given the complexity of the problem space) that can support good deduplication in an "added on later" way.
E.g. ext4 and btrfs have extent sharing which does work, but is completely inefficient (time). ZFS seems to be inefficient as well (space).
I'm off the cuff not aware of an open source deduplicating file system that does not have these issue. There are the deduplicating archivers (borg, restic, some others), but these are neither meant nor want to be general-purpose filesystems (although borg offers a ro FUSE FS with satisfactory performance).
Dragonflybsd's HAMMER filesystem seems to fit the bill nicely. There's even an option to limit the maximum amount of memory used for deduplication. Look up memlimit in the manual page: https://leaf.dragonflybsd.org/cgi/web-man?command=hammer&sec...
The Dedup heuristic I've heard is 2-3GB of ram per TB of raw storage.
Totally correct. In most scenarios dedup is simply unworkable. And the main source of problems.
Compression on the other hand is very standard and no issue at all, from many many years of ZFS experience. It's the default in many cases (ie on Nexenta)
And on many cloud systems, especially those with network disks like EBS, compression gives you some serious avg throughput wins.