Ts_zip: Text Compression Using Large Language Models
bellard.orgThis is interesting if impractical.
It occurs to me that using LLMs for compression could, in principle, allow lossy compression of text. If a sequence of tokens happens to be costly to encode (in terms of bits), the compressor might be able to replace that sequence with a cheaper sequence that has a very similar meaning within the context. I don't imagine it would be very useful for anything but it's interesting to think about.