Settings

Theme

How LZ4 works (2016)

ticki.github.io

176 points by codeismightier 9 years ago · 15 comments

Reader

wolf550e 9 years ago

LZ4 is good engineering work, better than Google snappy (zippy) which itself was better than LZO (both have somewhat improved by looking at each other's code since then). The current best general purpose compressor for decompression speed is Oodle Selkie, but it is not open source. I don't if a fast compressor for Oodle Selkie exists - since it was developed for game assets, they don't care about compression speed.

nerdponx 9 years ago

It's amazing how something useful and innovative gets invented, and isn't so much as documented, let alone patented or published in a journal.

  • userbinator 9 years ago

    LZ4 is only one of the many variations on the basic principle of the LZ family of algorithms, which is to replace repeated sequences with references to where they were before.

    What does amaze me a little, is the fact that the rather more complex Huffman algorithm was published and implemented decades before LZ.

    The core idea of LZ, however, has been known for centuries: https://en.wikipedia.org/wiki/Iteration_mark

    • hedora 9 years ago

      This is true, there is low algorithmic novelty in lz4.

      However, that's missing the point: lz4's decompressor is simultaneously simple and also the fastest thing aroud, at least at the moment.

      In fact, it is so fast, it can be used to accelerate local data transfers over "slow" links like bonded 10GBit ethernet and arrays of PCIe SSD's.

      • Darkenetor 9 years ago

        > lz4's decompressor is simultaneously simple and also the fastest thing aroud, at least at the moment

        The proprietary LzTurbo [0] and free Lizard [1] both claim to be faster at decompression while having better compression ratios.

        [0] https://github.com/powturbo/TurboBench

        [1] https://github.com/inikep/lizard

      • kzrdude 9 years ago

        Doesn't that also explain why it came later? The low compression ratio is useful nowadays, with large throughputs.

        • hedora 9 years ago

          Maybe. It might just be that the tricks it plays matter a lot on newer CPUs, and older fast compressors played older tricks.

          I think the ratio of CPU cycles to i/o bandwidth is what really matters. Presumably the optimal tradeoff between CPU throughout and compression ratios depends on that and varies over time.

        • TheDong 9 years ago

          The low compression ratio / fast speed trade-off has always been valuable.

          For example, block-level filesystem compression (what lz4 is used for with zfs) has been valuable for filesystems essentially forever.

          There's always been a need for very high throughput on the local filesystem.

    • ot 9 years ago

      Iteration mark is more similar to RLE than to LZ. LZ can copy from arbitrary points in the past.

  • beagle3 9 years ago

    LZ4 deserves a lot of respect, but it is merely a recent improvement on a very old idea.

    It wasn't novel when Lempel and Ziv described it 1977 - the encoding idea itself is almost trivial, and was described before. However, they did prove the conditions under which this compression is asymptotically optimal, which was NOT at all clear or trivial at the time - and it is therefore named after them.

    LZ4 is an implementation of the LZ77 idea that optimized run time first and compression ratio second. It is elegant and successful - but it has little novelty.

  • mattnewport 9 years ago

    The author has actually written a lot of great material on his blog: https://fastcompression.blogspot.ca/p/lz4.html?m=1

    Not sure what you'd consider suitably documented beyond what is already out there. Not being patented or published in a journal look like positives to me.

faragon 9 years ago

LZ4 is beautiful.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection