Fighting JPEG color banding

136 points by igoradamenko 4 years ago · 52 comments

Reader

I remember that almost 20 years ago, I played with a utility that allowed manual JPEG optimization by painting regions with desired quality settings. I think it was included on a CD bundled with "Magazyn internetowy WWW" [1]. Does anyone remember such program?

[1] https://sprzedajemy.pl/www-magazyn-internetowy-2003-nr-04-03... - incidentally this must be the very issue, see the teaser about image optimization

lynguist 4 years ago

This is OK for icon sized images, but it hurts me when I read webpages that incorporate photographs or visualizations in low resolution. They keep saying “Web resolution” and talk about something like 200x200 images maximum.

Even for the past 20 years, when I read webpages with images, I inspect them closely. Often it’s something like a newspaper article with photographs or Wikipedia. I specifically set my Wikipedia settings to deliver me images at the highest possible resolution and now I can actually enjoy reading it. Specifically I use the Timeless skin and set the thumbnail size to the maximum.

Sometimes I come across webpages that describe something like historic trains and all they have are icon sized photographs of them. It’s so sad.

viraptor 4 years ago

Thing I'd love to try if I had time - a compressor which derives the best table for the image. I'm imagining a loop of: start with the default, compress, calculate the difference from the original, dct the error to see which patterns are missed, adjust the table, repeat. Stop on some given error/size-increase ratio. (yes, I'm trying to get someone else nerd sniped into doing this)

Edit: something like this https://www.imaging.org/site/PDFS/Papers/2003/PICS-0-287/849...

So from the older known ones there DCTune and DCTex methods, but it seems neither is available for download anywhere.

jncraton 4 years ago

Guetzli was already mentioned and roughly does what you are talking about.
MozJPEG [1] includes several quantization tables that are optimized for different contexts (see the quant-table flag and source code for specific tables[2]), and the default quantization table has been optimized to outperform the recommended quantization tables in the original JPEG spec (Annex K).
It's also worth noting that MozJPEG uses Trellis quantization [3] to help improve quality without a per-image quantization table search. Basically rather than determining an optimal quantization table for the image, it minimizes rate distortion on a per-block level by tuning the quantized coefficients.
Both the SSIM and PSNR tuned quantization tables (2 and 4) provided by MozJPEG use a lower value in the first position of the quantization just like this article suggests (9 and 12 vs the libjpeg default of 16).
[1] https://github.com/mozilla/mozjpeg
[2] https://github.com/mozilla/mozjpeg/blob/5c6a0f0971edf1ed3cf3...
[3] https://en.wikipedia.org/wiki/Trellis_quantization
- homm 4 years ago
  
  > MozJPEG use a lower value in the first position of the quantization just like this article suggests
  It has lower value in the first position of the base table, i.e. the table which is used for q=50. With lower qualities this value scales up. This delays color banding from q=50 to roughly say q=40, after that the same effect is appears.
viraptor 4 years ago

Turns out there is a project which does that: https://github.com/google/guetzli
- duffyjp 4 years ago
  
  Guetzli is really hamstrung by its resource usage. When it first hit the news I tried it out, and compressing a full quality JPEG from my phone could take 20-30 minutes on an i7.
  - tambourine_man 4 years ago
    
    Yeah, I deal with it like an interesting research project, not a tool. For that, it’s MozJPEG.
  - viraptor 4 years ago
    
    It depends on the context. If I'm converting a whole library of photos I wouldn't use it. But I've got a big hero area JPEG that's loaded as one of the first resources - I'm happy to run this tool in the background for a day to make it 20% smaller.

hansword 4 years ago

I love this and would have dearly needed it like 5 years ago. Now, it is still a very interesting read.

But given what we have already seen from Nvidia on video compression [0], I think within the next few years, we will move everything to machine-learning-'compressed' images (aka transmitting a super-low-res seed image and some additional ASCII and having an ML model reconstruct and upscale it at the client side).

[0] https://www.dpreview.com/news/5756257699/nvidia-research-dev...

wongarsu 4 years ago

Most images are still JPEG (3 decades old) or PNG (2.5 decades old). Countless better formats have been developed, but with the exception of WEBP we are still using the same image formats that existed during the dot-com bubble. Ubiquity trumps improvements in image size.
Better encoders for JPEG or PNG are the main avenues how you can achieve improvements without compatibility problems, and I think that will stay true for another decade, if not more.
BiteCode_dev 4 years ago

I fear that we will get used to some image patterns feeling right instead of reality if AI gets involved too much.
dylan604 4 years ago

honestly, this scares this shit out of me.
lossy compression is one thing, but to just say that an ML model suggests making pixels like this vs a mathematical formula is totally different things.
- dehrmann 4 years ago
  
  What if it's an ML model suggesting parameters for jpeg? It could still hallucinate to some degree, but it's also more limited.
  - dylan604 4 years ago
    
    Image -> mathematical forumla to toss data -> reverse formula -> slightly altered image
    vs
    Image -> mathematical formula to toss data -> ML to recreate what it thinks is supposed to be there -> made up image based on "training" data not even from original image
    that's my problem
    
    rasz 4 years ago
    
    When you think "ML to recreate what it thinks is supposed to be there" you probably automatically go to DeepDream or https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...
    but the end result doesnt have to be direct output of ML hallucination. AI encodes probability distribution, you can treat it as motion compensation in video codecs - what comes next is a convolution by encoded error between predicted outcome and ground truth.
    
    dylan604 4 years ago
    
    So how is that different than motion estimation as it currently stands. That at least sees where pixels are and then where they will be. So instead of storing all of that data, just store where they start and then end and then tween the diff. Isn't that what this "new" ML you just describe does but "different" by slapping "trained ML/AI" to it?
    
    adgjlsfhk1 4 years ago
    
    the difference is that the better you can predict the motion, the less data you have to store, and ML models are much better than hand tuned heuristics at predicting motion. It's no different than the recent use of ML for chess programs. The search techniques remain pretty similar, but neural networks are often much better at evaluation of objective criteria than hand-coded heuristics.
    
    adgjlsfhk1 4 years ago
    
    ML based image compression don't generally let the net make up data, they use a net as a prior to reduce the entropy of the data that's there.
    
    dehrmann 4 years ago
    
    What I'm saying is I'm not sure how much ML can imagine just by changing coefficient precision.

lifthrasiir 4 years ago

The original JPEG, retroactively named JPEG1, totally lacked loop filters and the quantization factor of DC coefficient matters much more than modern formats. As an example, libjpeg q89 is noticably worse than q90 because the DC quantization factor changes from 3 to 4 (smaller factor means less quantization thus higher quality), quite a big jump.

cabirum 4 years ago

There's also webp and heif and png and svg, and I believe all the existing formats already solve the image compression problem. The difference of 18kb vs 22kb from hours of microoptimisations is frankly irrelevant given the rate of networks getting faster.

rpigab 4 years ago

Don't tell that to NFT people who want to put everything on-chain but have to sell their house in gas fees to store a small jpg on Ethereum! It's like the 70s all over again, but with blockchain instead of floppy disks. (I've seen a few of them discuss better compression algorithms because they really felt it was extremely useful and meaningful to store the actual data on-chain and not a IPFS link like they usually do)
homm 4 years ago

The simple question: if all the existing formats already solve the image compression problem, why a new image formats (WebP, HEIC, AVIF, JPEG XL etc) appears?
- mceachen 4 years ago
  As computers generally get faster, more expensive encoding can be considered.
  I'm using different tooling on an AMD 3900x for these conversions, so take these numbers with a grain of salt.
  $ gm identify test.tiff test.tiff TIFF 6240x4160+0+0 DirectClass 8-bit 1.2Mi 0.000u 0m:0.000001s $ time gm convert test.tiff test.jpg real 0m0.282s user 0m0.193s sys 0m0.089s $ time heif-enc test.jpg -o test.heif real 0m1.901s user 0m22.960s sys 0m0.180s
  So... that's literally 100x more CPU time to encode the HEIF than the JPEG. The JPEG is 1.1M, and the HEIF is 800K.
  In my prior tests AV1 is 2-5x slower than HEIF, and JPEG-XL is ~10x slower.
notum 4 years ago

Not even that but the chosen reference of the same file size (Q15+Fix vs QL18) is hardly distinguishable to me.
In essence, the author discovered how to make JPEG images look better by increasing their file size.
- rndgermandude 4 years ago
  
  >Q15+Fix vs QL18) is hardly distinguishable to me.
  What can I say, it is very distinguishable to me, at least for images that contain larger "homogeneous" areas like skies.
  >In essence, the author discovered how to make JPEG images look better by increasing their file size.
  This is not what the article is about. E.g. the first image with the large blue sky area, the result was a file size that was halved for Q15+fix compared to the Q50 source, and the Q18 comparison image at the same file size as Q15+fix looks like crap.
  So the author got vastly more visual quality for the same or similar file size, while still producing valid jpegs.
  It might not matter that much in the grand scheme of things, but it probably matters a lot to the company he is working for, which specializes in image processing, compression and delivery as a service it seems (bandwidth and traffic are cheap, but not free). And it will probably matter at least for some of their customers as well.
  It probably won't matter much if you are e.g. on reddit (or are reddit), and that post with that 90kb jpeg (which could have been maybe 50-60 kb with the optimizations mentioned in the article) pulled in 10.1 MB of other crap (wire size) in the 30 seconds the page was open. With an ad-blocker active. Yes, I just ran this very unscientific test.
  In the future, other formats like webp (already somewhat widely deployed), avif (browser support is getting there) or jpeg-xl (very promising results per watt compared to avif and sometimes webp, with a nifty lossless jpeg<->jxl mode) - but probably not heif because of the patent situation - might become more dominant, but for the time being a lot of images online and offline will remain jpeg and produced as jpegs.
  (png and svg the grandparent poster brought up are for other use cases, btw, and offer lousy to untenable compression for photographs)

yboris 4 years ago

Just use JPEG XL (aka .jxl) - https://jpegxl.info/

You can re-save a JXL image a thousand times without deterioration :)

rasz 4 years ago

Does this mean default quantization tables were badly picked after all? And someone only noticed after 30 years?

It sounds like the employed solution only modifies first element from [16,17] to [10,16].

homm 4 years ago

It was not badly picked for regular use cases. Just, before the retina displays appeared no one was interested in extreme low bitrate and no one knew that different artifacts had different impact with high density.
At least this problem was highlighted by Kornel, one of the mozjpeg author here: https://github.com/mozilla/mozjpeg/issues/76
> the employed solution only modifies first element from [16,17] to [10,16].
Correction: 16 and 17 are values from the base tables, which means this table is used with q=50. With q=25 it will be [32, 22, 24, 28, 24, 20, 32, 28…] (in zigzag order). The employed solution is to always limit the first value by 10 regardless of q: [10, 22, 24, 28, 24, 20, 32, 28…]
mozjpeg chosen the different approach: it still scales all values based on q, but has significantly changed the default base table. It helps, but doesn't eliminate color banding completely (you can still see it on the example from issues/76).
GuB-42 4 years ago

The tables were good for the time they were picked. And the beginning of the article, it shows that at low resolutions, both ringing and banding are unacceptable. At high resolutions, beyond what was considered normal when the original quantization tables were chosen, then ringing becomes much less of a problem, so it makes sense now to change the quantization tables to prioritize banding over ringing.
It is no secret that high resolution images and low resolution images compress differently, and modern codecs are optimized for high resolutions in a way that older codecs weren't. For example going from H.261 to H.266 globally improve video compression at every step, but it is most apparent at higher resolutions.
- hansword 4 years ago
  
  > At high resolutions, beyond what was considered normal when the original quantization tables were chosen, then ringing becomes much less of a problem
  Slightly more precise: ringing becomes much less of a problem in high resolution images when viewed on a high resolution screen (in Apple language: retina).

erlndgdt 4 years ago

How is this different from say, what https://kraken.io is doing. When I upload images there I get a smaller file size and the image looks no different

hansword 4 years ago

The difference is that the kraken 'about technology' website [0] gives no useful information as to how they are doing it (I guess it's 'proprietary information'), while this article gives a very detailed description of how to compress jpegs.
In other words: end users use kraken, developers read this article.
[0] https://kraken.io/about/technology
- ntoskrnl 4 years ago
  
  I can take a guess at that "proprietary information". Most image optimization sites are just thin wrappers over something like mozjpeg/zopflipng/etc. 1% tech, 99% marketing
homm 4 years ago

It's for free.

saltminer 4 years ago

> Ok, but where does this table come from when we need to save a file? It would be a big complication if you had to construct, and transmit 64 independent numbers as a parameter. Instead, most encoders provide a simple interface to set all 64 values simultaneously. This is the well known “quality,” which value could be from 0 to 100. So, we just provide the encoder desired quality and it scales some “base” quantization table. The higher quality, the lower values in quantization table.

I never really thought about how that "quality" slider worked (besides making the compression lossier), but it makes perfect sense now! It always amazes me how much I take for granted.

I always treat compression like a black box: "-crf 23" for H264, PNG and FLAC are nice but MP3 320s and 90+ "quality" JPEGs are good compromises, etc. And that's just for the stuff I deal with, there's no telling how much lossy compression goes on behind the scenes on my own computers, let alone all the stuff served up over the internet. There's so much lossy compression in the world, from EP speed on VHS tapes to websites reencoding uploaded images to every online video ever, it's crazy to think about.

Firadeoclus 4 years ago

Though its impact may be limited, this is some nice work!

Now could someone look at how video codecs can produce excellent high-detail scenes and motion in 4k resolution while at the same time making a blocky mess out of soft gradients, especially in dark scenes with slow movement?

LordDragonfang 4 years ago

If you want a similar read on DCT-based encoding, I highly recommend this article which lays it out in an extremely digestible format:

https://sidbala.com/h-264-is-magic/

mcdonje 4 years ago

I ran into this problem exporting images from Dark Table. The solution I ended up going with was simply exporting as png instead of jpg.

hulitu 4 years ago

just use PNG. JPEG sucks.

gruturo 4 years ago

For a screenshot? Yeah JPEG sucks. For a diagram with large blocks of uniform color and sharp edges? Yeah JPEG sucks.
But for pictures, JPEG is in its comfort zone and it's amazing it is still doing comparatively well after the IT equivalent of 150 years. Only now worthy alternatives are starting to emerge (looking at you JPEG XL), not for lack of trying (looking at you WEBP). It's incredible it managed to stay relevant for so long, and while surely patents, sunken cost fallacies, hardware implementations and inertia played a part, none of this would have mattered if it hadn't been pretty good to start with.
- zinekeller 4 years ago
  
  > none of this would have mattered if it hadn't been pretty good to start with.
  Considering that MPEG (and competitors) have evolved due to its deficiencies (the current baseline today is H.264 and not the original version H.261 or even its immediate successor MPEG-1), I'm surprised that JPEG is just showing its deficiencies today and not in 2000. Actually, it's a complement that although multiple file formats were invented to handle lossy pictures but even WebP can't beat JPEG all the time (especially that WebP can only save up to 16k pixels per side while JPEG can handle up to 64k pixels).
bob1029 4 years ago

JPEG absolutely destroys PNG in areas of performance and size when it comes to typical use cases.
Lossy compression is not a bad thing. We need to get over this.
LocalH 4 years ago

PNG sucks for some uses too. There is no universally perfect image format for all possible use cases.
- jug 4 years ago
  
  WebP & JPEG XL compress losslessly better than PNG and lossy much better than JPEG. Perhaps not perfect either, but we finally do have formats that can do both — and better than either before.
igoradamenkoOP 4 years ago

Well, if we're talking about web, then I it's better to use WebP rather than PNG.
It's widely supported by vendors if you do not need to deal with IE: https://caniuse.com/?search=webp.
- jeroenhd 4 years ago
  
  Do note that if you need to target macOS 10 + Safari, WebP is not available.
  WebP is available for macOS 11, even with overlapping versions of Safari, but Apple relies on the OS image library to render some images and they haven't backported WebP supported when they updated Safari.
martin_a 4 years ago

That is only true for a subset of images, mostly the rather "simple" ones. Try saving photographs as PNG and see your filesize explode.
keyle 4 years ago

Please these kind of comments are entirely counter productive and don't add anything to the conversation. No matter the topic, presume some people are truly interested in the topic and feel free to stay off it.
There is a space for JPEG and for PNG. Here is a good base reference https://www.a2hosting.com/blog/pngs-vs-jpegs/
Markoff 4 years ago

or JPEG XL
- igoradamenkoOP 4 years ago
  
  Well, yeah, unless you want to use it in a browser: https://caniuse.com/jpegxl

Settings

Fighting JPEG color banding

Keyboard Shortcuts