LZ4 (compression algorithm)

{{Short description|Loseless compression algorithm}}

{{use dmy dates|date=January 2021}}

{{Infobox software

| name = LZ4

| author = Yann Collet

| developer = Yann Collet

| operating system = Cross-platform

| genre = Data compression

| programming language = C

| license = Simplified BSD License

| title =

| logo =

| released = {{Start date|2011|04|24|df=y}}

| discontinued =

| latest release version = {{wikidata|property|preferred|references|edit|P348|P548=Q2804309}} | latest release date = {{Start date and age|{{wikidata|qualifier|preferred|single|P348|P548=Q2804309|P577}}|df=yes}}

| latest preview version =

| platform = Portable

| size =

| website = {{Official URL}}

}}

{{Infobox file format

| name = LZ4 Frame Format

| genre = Data compression

| magic = 04 22 4d 18{{cite web |last1=Collet |first1=Yann |title=LZ4 Frame Format Description |website=GitHub |url=https://github.com/lz4/lz4/blob/master/doc/lz4_Frame_format.md |access-date=7 October 2020}}

| url = https://github.com/lz4/lz4/blob/master/doc/lz4_Frame_format.md

}}

LZ4 is a lossless data compression algorithm that is focused on compression and decompression speed. It belongs to the LZ77 family of byte-oriented compression schemes.

Features

The LZ4 algorithm aims to provide a good trade-off between speed and compression ratio. Typically, it has a smaller (i.e., worse) compression ratio than the similar LZO algorithm, which in turn is worse than algorithms like DEFLATE. However, LZ4 compression speed is similar to LZO and several times faster than DEFLATE, while decompression speed is significantly faster than LZO.{{cite web

| url=https://www.phoronix.com/scan.php?page=news_item&px=MTI4NjM

| title=Support For Compressing The Linux Kernel With LZ4

| author=Michael Larabel

| date=2013-01-28

| work=Phoronix

| access-date=2015-08-28}}

Design

LZ4 only uses a dictionary-matching stage (LZ77), and unlike other common compression algorithms does not combine it with an entropy coding stage (e.g. Huffman coding in DEFLATE).{{cite web |url=https://github.com/lz4/lz4/blob/dev/doc/lz4_Block_format.md |title=LZ4 Block Format Description |last=Collet |first=Yann |date=2019-03-30 |website=GitHub |access-date=2020-07-09 |quote=There is no entropy encoder back-end nor framing layer.}}{{cite IETF |title=DEFLATE Compressed Data Format Specification version 1.3 |rfc= 1951 |publisher=IETF |access-date=2020-07-09}}

The LZ4 algorithm represents the data as a series of sequences. Each sequence begins with a one-byte token that is broken into two 4-bit fields. The first field represents the number of literal bytes that are to be copied to the output. The second field represents the number of bytes to copy from the already decoded output buffer (with 0 representing the minimum match length of 4 bytes). A value of 15 in either of the bitfields indicates that the length is larger and there is an extra byte of data that is to be added to the length. A value of 255 in these extra bytes indicates that yet another byte is to be added. Hence arbitrary lengths are represented by a series of extra bytes containing the value 255. The string of literals comes after the token and any extra bytes needed to indicate string length. This is followed by an offset that indicates how far back in the output buffer to begin copying. The extra bytes (if any) of the match-length come at the end of the sequence.{{cite web

| url=http://fastcompression.blogspot.com/2011/05/lz4-explained.html

| title=RealTime Data Compression

| author=Yann Collet

| date=2011-05-26

| access-date=2015-08-28}}{{cite web

| url=https://ticki.github.io/blog/how-lz4-works/

| title=How LZ4 works

| author=ticki

| date=2016-10-25

| access-date=2017-06-29}}

Compression can be carried out in a stream or in blocks. Higher compression ratios can be achieved by investing more effort in finding the best matches. This results in both a smaller output and faster decompression.

Implementation

The reference implementation in C by Yann Collet is licensed under a BSD license. There are ports and bindings in various languages including Java, C#, Rust, and Python.{{Github|lz4/lz4|Extremely Fast Compression algorithm http://www.lz4.org}} The Apache Hadoop system uses this algorithm for fast compression. LZ4 was also implemented natively in the Linux kernel 3.11.{{cite web

| url=https://lwn.net/Articles/557814/

| title=Kernel development

| author=Jonathan Corbet

| date=2013-07-19

| publisher=LWN.net

| access-date=2015-08-28}} The FreeBSD, Illumos, ZFS on Linux, and ZFS-OSX implementations of the ZFS filesystem support the LZ4 algorithm for on-the-fly compression.{{cite web

| url=https://www.freebsd.org/releases/9.2R/relnotes.html

| title=FreeBSD 9.2-RELEASE Release Notes

| date=2013-11-13

| publisher=FreeBSD

| access-date=2015-08-28}}{{cite web

| url=http://wiki.illumos.org/display/illumos/LZ4+Compression

| title=LZ4 Compression

| publisher=illumos

| access-date=2015-08-28

| archive-date=9 October 2018

| archive-url=https://web.archive.org/web/20181009050216/https://wiki.illumos.org/display/illumos/LZ4+Compression

| url-status=dead

}}{{Github|zfsonlinux/zfs/commit/9759c60|Illumos #3035 LZ4 compression support in ZFS and GRUB}}{{cite web

| url=http://www.open-zfs.org/wiki/Features#lz4_compression

| title=Features: lz4 compression

| publisher=OpenZFS

| access-date=2015-08-28}} Linux supports LZ4 for SquashFS since 3.19-rc1.{{cite web

| url=https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=62421645bb702c077ee5a462815525106cb53bcf

| title=Squashfs: Add LZ4 compression configuration option

| author=Phillip Lougher

| date=2014-11-27

| access-date=2015-08-28}} LZ4 is also supported by the newer zstd command line utility by Yann Collet, as well as a 7-Zip fork called 7-Zip-zstd.[https://github.com/mcmilk/7-Zip-zstd 7-zip-zstd]

References

{{Reflist}}