We don’t need new multimedia formats

Following Google’s announcement of WebP, there’s been a lot of (justified) criticism of it based on its merits as an image coder. As a compression geek, I’m interested, but as both a creator and consumer of digital media formats, my main criticism is something else entirely: The last thing we need is yet another image format. This was already true when JPEG 2000 and JPEG XR were released. Never mind that their purported big compression advantages were greatly exaggerated (and based on dubious metrics); even if the standards had delivered clear-cut improvements on that front, they would’ve been irrelevant. Improved compression rate is not a feature. Not by itself, anyway. As someone who deals with content, I care about compression rate only as long as size is a limiting factor. As someone taking photos, I care whether all the pictures I shoot in one day fit onto my camera. Back when digital cameras were crappy, had low resolution and a tiny amount of storage, the JPEG compression made a big difference. Nowadays, we have 8 Megapixel cameras and multiple Gigabytes of storage in our cellphones. For normal usage, I can shoot RAW photos all day long and still not exhaust the space available to me. It’s done, problem solved. I might care about the size reduction once I start archiving stuff, but again, the 10:1 I get with plain JPEG at decent quality is plenty – a guaranteed 30% improvement on top of that (which JPEG2k/JPEG-XR/WebP don’t deliver!) is firmly into “don’t care” territory. It doesn’t solve any real problem.

The story is the same way for audio. MP3 offered an reduction of about 10:1 for CD quality audio at the right time, and made digitally archiving music practical. And we still gladly take the 10:1 on everything, since fitting 10x more on our MP3 players is convenient. But again, that’s only a factor of storage limitations that are rapidly disappearing. MP3 players started getting popular because they were smaller than mobile CD players and could (gosh!) store multiple albums worth of music (of course, back then “multiple albums” was a number in the single digits, and only if you compressed them greatly). Nowadays, most people can easily fit their complete music collection onto an iPod (or, again, their cellphone). Give it another 5 years and you’ll have enough space to fit it in as uncompressed WAV files if you choose to (not going to happen since most people now get music directly in MP3/AAC format, but it would be possible). Again, problem solved. Better audio compression remains an interesting problem with lots of fascinating ties to perception and psychology, but there’s just no real practical need for better audio compression these days. Audio just isn’t that much data.

Video is about the only mainstream type of content where the compression ratio still matters: 1080i60 video (as used in sports broadcasts for example) is about 90 megabytes per second in the subsamples YUV 4:2:0 color space it typically comes in, and about 350MB/s in the de-interlaced RGB color space we use for display. That’s unwieldy enough to need some serious compression (and partial or complete hardware support for decompression). And even the compressed representations are large enough to be unwieldy (we stick them onto Blu-ray discs and worry about the video streaming costs). So there’s still some future for innovation there, but even that window is rapidly closing. Blu-ray is probably the last disc format that was motivated partly by the need to store the amount of content needed for high-fidelity video. HDTV resolutions are gonna stay with us for a good while (HD is close enough to the internal resolutions used in movie production to make any further improvements subject to rapidly diminishing returns), and Blu-rays are already large enough to store HD content with good quality using current codec technology. BDXL is on the way; again, that’s just large enough for what we want to do with it. The next generation of video codecs after H.264 is probably still going to matter, since video is still a shitload of data right now. But we’ve been getting immensely better at large-scale image and signal processing (mainly with sheer brute force) and Moore’s law works in our favor. Five years ago, digital video was something done mainly by professionals and enthusiasts with the willingness to invest into expensive hardware. Nowadays, you can get cheap HD camcorders and do video postproduction on a normal laptop, if you’re willing to stomach the somewhat awkward workflow and long processing times. Ten years from now, a single 720p video will be something you can deal with as a matter of course on any device, just as you do with a single MP3 nowadays (…remember when encoding them took 2x as long as their runtime? Yeah, that was just 10 years ago).

And in the meantime, the last thing we need is yet more mutually incompatible formats with different feature sets and representations to make the lives of everyone dealing with this crap a living hell. If you don’t have any actual, useful features to offer (a standard lossy format with alpha channel would be nice, as would HDR support in a mainstream format) just shut up already.