NNCP is an experiment to build a practical lossless data compressor
with neural networks. The latest version uses a Transformer
model.
Result for enwik9:
Fabrice Bellard - https://bellard.org/
The papers nncp_v2.1.pdf and nncp.pdf describe the algorithms and results of previous releases of NNCP.
The current release of NNCP is implemented in C and uses LibNC to get better performance than PyTorch.
Compression ratio
Result for enwik8:| Program | Compr. size (bytes) | Ratio (bpb) |
|---|---|---|
| gzip | 36 445 248 | 2.92 |
| xz | 24 865 244 | 1.99 |
| NNCP (2023-10-21) | 14 915 298 | 1.19 |
| CMIX (v19) | 14 837 987 | 1.19 |
Result for enwik9:
| Program | Compr. size (bytes) | Ratio (bpb) | Program size (zip, bytes) | Total (bytes) |
|---|---|---|---|---|
| gzip | 322 591 995 | 2.58 | 38 801 | 322 630 796 |
| xz | 197 331 816 | 1.58 | 36 752 | 197 368 568 |
| CMIX (v19) | 111,470,932 | 0.892 | 223 485 | 111 694 417 |
| NNCP (2023-10-21) | 106 632 363 | 0.853 | 628 955 | 107 261 318 |
* The results for the other programs are from the Large Text Compression Benchmark.
Download
- NNCP v3.3: Linux version (including CUDA support): nncp-2024-06-05.tar.gz (Changelog, readme.txt).
- NNCP v3.3: Precompiled Windows version (including CUDA support): nncp-2024-06-05-win64.zip.
- NNCP v2 (Python+PyTorch, GPU required): nncp_v2-2021-02-06-1.tar.gz
Related Links
- ts_zip: a practical text compression utility using a large language model.
- LibNC: C Library for Tensor Manipulation.
- NNCP thread on the encode.su forum.
- CMIX lossless data compression program.
- lstm-compress: lossless data compression with LSTM.
- Large Text Compression Benchmark.
Fabrice Bellard - https://bellard.org/