AI-Powered Nvidia RTX Video HDR Transforms Standard Video into HDR Video
blogs.nvidia.comA bit of a let down that the video demoing SDR->HDR conversion is itself only published in SDR. Makes as much sense as demoing a colorization tool in a grayscale video!
At this point, with any new model I think it makes sense to wait until you can run the model on your own input before making any assumptions based on cherry picked examples.
If they were serious about showing this tech off they should've provided a video file download. Also indicate that it's a HDR file and should only be viewed on a HDR display. Youtube is just making this look bad as people won't see a difference.
YouTube supports HDR video, no need for a separate download.
YouTube tends to post a downscaled SD version first, then they encode and post the higher-res versions when they get around to it. This can take days in some cases. Meanwhile the creator catches the flak...
Creators don't publish videos until the high-res versions are done processing.
You don't need high res for HDR on YouTube (144p HDR is a thing there oddly enough) and the 4k version had already processed when I posted that comment (with no change since in HDR availability). Usually media announcements/large channels pre-upload the video so it's ready when they want it to actually publish to avoid that kind of issue though.
They probably use 144p for automated systems like ML recommendations, moderation and content ID.
4K processing takes just minutes, but HDR processing can take over a month to… never. There is no indication of this at all, no progress bar or eta. Just check manually every few days!
This is why everyone is giving up on HDR, it’s just too painful because the content distributors are all so bad at it, with Netflix being the sole exception.
Sounds more like you should be giving up on YouTube.
HDR video playback in the browser is pretty unreliable unless you're on a Mac.
It's also pretty unreliable on Mac too...
It's more reliable then on linux though, and windows has been doing "auto HDR" for videos for years, so kinda hard to tell when something is HDR or not there.
In what way? I've been doing it without issue on PC longer than I've even owned a Mac.
Firefox has zero HDR support, as an example.
The product is only for Chromium based browsers on Windows so surely there's no harm in enabling the product video for that combo too, no?
FF on Mac shows HDR on full screen only, how strange.
That would be true on Mac also?
Firefox supports HDR video output on Mac since release 100.
What grinds my gears is that HDR output in a lot of software is "Mac only", even though Windows supports HDR video output just fine, and has had support for wide-color and HDR since Vista!
E.g.: DaVinci Resolve has HDR capable only in its Mac version.
Similarly, generating a Dolby Vision file is basically impossible on Windows.
"Just buy a Mac" seems to be the best and most practical guidance I've seen for HDR workflows...
HDR through YouTube appears to work fine even on my non HDR certified HDR monitor.
I am frequently disappointed by such videos.
Ridiculous. Like when James Cameron promoted Avatar HDR with an SDR YouTube video, while YT is perfectly capable of HDR playback.
At least as of a couple of years ago, HDR support on YouTube has been pretty bad[1]. I know they've been working to improve things since, but I kind of don't blame people for walking away from that mess.
Thanks, will check out LTT's gripes, but I've been watching the following HDR channels forever and they look great:
https://www.youtube.com/@SeoulWalker https://www.youtube.com/@Rambalac https://www.youtube.com/@Relaxing.Scenes.Driving
I'm also glad that Rambalac is back as he quit a few months ago. I've recently also started uploading 4K60 HDR content to YouTube [1] and it takes up to one week more time for them to encode than the SDR version. You can include your own LUT instead of YouTube conversion which seems to help. Here's an article and LUT [2] + a video [3] with valuable info. They allowed me getting DJI Pocket 3 HLG recordings to HDR10.
[1] https://youtu.be/0S8hw8Lrvlk [2] https://www.wesleyknapp.com/blog/hdr [3] https://youtu.be/4izJfgRtkZE
Can´t recommend Rambalac enough - I have pretty much re-traced his steps multiple times during our Japan trip a couple times & it really helped with orientation. :)
Also some of the walks are really interesting & really gives you the context of various places in Japan. :)
The real issue is it's either HDR or good SDR, but not both at the same time
It’s still bad. Even as a totally amateur videographer making short clips of my holidays, YouTube is not good enough.
HDR processing takes months!
The SDR down conversion is “potato quality” even to my non-expert eyes, let alone a Hollywood colourist.
Etc…
Instead of YouTube's HDR->SDR conversion you are free to use your own conversion LUT with mkvmerge. As I posted above, here are some links for info:
- Workflow and a valuable LUT: https://www.wesleyknapp.com/blog/hdr
- Some DaVinci Resolve Settings to use on SDR monitors: https://youtu.be/4izJfgRtkZE (though I upload 4K60 HDR at 37.5Mbit) which is enough for me slow content.
This only allows a single LUT for the entire video. For comparison, Resolve will perform Dolby Vision tone mapping from HDR to SDR on a clip-by-clip basis.
HLG still not good enough?
I guess. There's a lot of details we don't know that would change the calculus on this.
To use a analogous workflow, it could be like saying, "It's pointless to shoot video in 10-bit log if it's going to be displayed on Rec.709 at 8-bits." It completely leaves out available transforms and manipulations in HDR that do have a noticeable impact even when SDR is the target.
Again, we can't know if it's important given the information that's available, but we can't know if it's pointless either.
I could see a future where this works really well. It doesn't seem to be the case right now though.
The "super resolution" showcased in the video seemed almost identical to adjusting the "sharpness" in any basic photo editing software. That is to say, perceived sharpness goes up, but actual conveyed details stays identical.
Note that YouTube is really bad for these demos due to the re-compression, even in zoomed in stills.
Allegedly the new one plus phone does this trick in real time as well as up sampling and interframe motion interpretation. Mrwhostheboss seems impressed, but I don't really trust his yet judgment on these things.
The iPhone has also done this, for a few years now. It was, surprisingly, a one sentence mention in the keynote/release notes.
Whatever special sauce the Nvidia shield uses is honestly incredible. Real time upscaling of any stream, and not just optimized for low res source, its like a force multiplier on content that is already HD. Supposedly the windows drivers do it as well but the effect seems less noticeable to me in my tests
I'm curious - what's the best open-source video upscaling library out there?
I looked back about a year ago, and it didn't seem like there were any good open-source solutions.
Topaz is light years ahead of any open source solution unfortunately.
An HN search of ''Deep Space Nine'' and ''Topaz'' will show some great discussions here covering the dearth of such upscaling solutions, as well as some huge efforts before commonplace AI.
I found this single discussion? https://news.ycombinator.com/item?id=19453745
And that's by avoiding the word "topaz", where I see no story results with discussions and not much of comment results.
I can help (I added ''remaster''):
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
Links to the artciles found in those discussions are to some very enlightening efforts by Joel Hruska to find an upscaling solution related to what the person asked about. As somebody else mentioned, Topaz is out there and Hruska gave it a good shot, but it is not open source.
Right, I did find those, but only one had a discussion. Just making sure I'm not missing anything, so thanks for the clarification.
It's not exactly what you're after, as it's anime specific and you need to process the video yourself (eg disassemble to frames, run the upscaler, then assemble back to a movie file), but Real-ESRGAN is very good for cleaning up old, low resolution anime:
If you want to avoid manual processing, Anime4K runs in real time as a GLSL shader you can load into MPV (or Plex or something called IINA according to the readme) and still gives great results.
Thanks, hadn't come across that before. :)
It depends on what do you mean by 'open-source', along with training materials and full setup? That will be hard to find. Upscaling was popular like 10 years back. That's why there is no much interest today. Training in old style isn't that hard. But artifacts are popping up in all videos I've seen.
That seems like a gimmick and I actually prefer SDR video that is not upscaled. There is something ugly about those AI treated videos. They look fake.
The RTX video upscaling feature works really well, there's a bug in the Firefox implementation that allows you to switch between native and upscaled side by side and the difference is striking. I don't have an HDR monitor so I can't tell you how well this new HDR feature works.
They are fake. Ultimately it’s not recovering lost detail, it’s making shit up
These remind me of the Samsung debacle about recognizing moon and emplacing a high quality texture of it into the image shot by camera.
Exactly. This is akin to upscaling or frame rate interpolation. No consumers want this, they turn it off in settings.
I don't think making things up is the problem, it's if it's believable. If it's indistinguishable to a viewer, then who cares. I never would have thought the HDR of the clouds was "made up".
Maybe I'm odd, but a big part of art to me is seeing things how the creator intended it to be seen.
So I calibrate all my media consumption displays etc. I could never see myself using some automated SDR -> HDR conversion like this.
Even if it looks natural, it doesn't look like it was supposed to, and I want to see it how it was supposed to look.
I would use it on every single video I've ever made myself, because intent had nothing to do with how my videos look. They were made with then best camera I had available, and HDR has only been available relatively recently.
This is a tool that I want to use. If nobody can tell I used it, that's a good thing for me. If you don't want to use it, then nobody is making you.
"I don't think making things up is the problem, it's if it's believable"
Depends of course if it's being passed off as reality. Slippery slope and all.
I recently had some old super8 films shot by my parents scanned into 1080p resolution in ProresHQ. Because of the poor optics of the original camera, imperfect focus when shooting, poor lightning conditions, and general deterioration of the film stock, most of the footage won't get anywhere near what 1080p could deliver.
What I'd like to try at some point is to let some AI/ML model process the frames, and instead of necessarily scaling it up to 4k etc., 'just' add (aka magic) missing detail into 1080p version and generally unblur it.
Is there anything out there, even in research phase that can take existing video stock, and then hallucinate into it detail that never was there to begin with? What NVidia is demoing here seem like steps to that direction...
I did test out Topaz Video and DaVinci's built-in super resolution feature, both of which gave me a 4k video with some changes to the original. But not the magic I am after.
I also restored some Super 8 footage recently and had great success. The biggest win I had wasn't resolution, but slowing down the speed to be correct in DaVinci, and interpolating frames to make it 60fps using the RIFE algorithm in FlowFrames. I then used Film9 to remove shake, colour-correct, sharpen and so on.
Correcting the speed and interpolating frames added an amazing amount of detail that wasn't perceptible to me in the originals (albeit it was there).
All of this processing does remove some of the charm of the medium, so I'll be keeping the original scans in any case.
How did you do the original scanning? I have a ton of Super 8 that needs to be scanned.
I bought one of the cheapish (€300) Super 8/8mm scanners on Amazon. It scans quite quickly while displaying the results on a small screen.
It's a nice convenient device, but I can't now unsee the artifacting and compression arising from it. If I were to do it again I'd just pay a service to scan properly, or build a rig to photograph the frames.
On the other hand, I'm very pleased to have scanned and archived the films given that they've been unseen for so long and can now be shared easily.
An interesting thing about Super8: the resolution is generally very poor, but it can have quite the dynamic range. Also, with film in general (and video, but it's easier with film because you have global shutter) you can compensate motion blur and get more detail out which isn't visible when you look at the film frame by frame. And none of this needs AI.
Regarding hallucination, I agree with the sibling comment, the problem is that faces change. And with video, I'm not even sure the same person would have the same face in various parts of the video...
there is AI tech to do this already. it has a slight problem, though: it adds detail to faces (this is marketing speak for completely changes how people look).
Something like this will always change the original as it's guessing what should be there as it up scales. Only time will improve the guessing.
You could look into RTX Video Super Resolution
The HDR transformation was really impresive. The Upscale not so much. At least not in my monitor.
Speaking of which, Nvidia has built-in live AI upscaling on the Shield TV android box.
- Is there any stand-alone live AI upscaling / 'enhance' alternative for android or any other platform?
The Shield is kind of an extreme outlier in today's environment. A device from 2015 that 9 years later is still one of the top tier choices in its (consumer) market is almost unheard of.
In fact it's reportedly the currently supported Android device out there with the longest support[0], it's crazy that mine still gets updates.
[0]https://www.androidcentral.com/android-longest-support-life-...
It really is awesome. I also enjoy the UI that allows you to side by side compare a stream and the difference is insane.
I have been meaning to see how well it handles streaming a desktop via moonlight to the shield to real time upscale a second monitor's content. I assume it's trained for video footage and not static UI components. The RTX windows drivers don't seem to upscale as well as the shield.
Interested in this too. I replaced my shield with a steamlink to desktop that does upscaling which is very clunky.
So, should one buy a Shield TV today?
It’s pricey, and being so old, I fear it will soon be obsoleted…
I think they should rephrase. It makes SDR appear HDR. It’s just making up information no? It’s not actually making it HDR just it appears to be HDR?
Making up information? The same can be said for most commonly used modern compressed video formats. Just low bitrate streams of data that gets interpolated and predicted into producing what looks like high resolution video. AV1 even has entire systems for synthesizing film grain.
The way i see it, if the ai generated HDR looks good, why not? It wouldn't be more fake or made up than the rest of the video.
Now it will be absolutely impossible to accurately convey the artistic intent, when there's no way to know how it will look on consumer devices.
Consumer devices have never been known for color accuracy and goes back a very long ways. The running joke in broadcast was that NTSC stood for "Never Twice the Same Color".
I think we lost that battle with motion interpolation on consumer TVs
Very easy solution: start filming in 120fps and then the TV motion interpolation does nothing.
Already happened with brightness and contrast controls
I can't get any two computer monitors that are not the same model to give me the same color.
I wonder if AI can be used to extrapolate 4:3 to 16:9 format or to create stereoscopic video (for use in VR or 3D TV's)
During the brief moment that 3DTV was popular, almost all 3DTVs had a mode that could "convert" 2D to 3D, based on movement in the scene and other pre-learned cues. "Things that look like people should be in front of things that look like scenery", and so on.
I miss 3D. I loved it, and I was sad that it didn't catch on. It enjoyed a longer life in Europe, where 3D blu-rays were produced for a few more years after they stopped selling them in the US, and I imported and enjoyed several.
Maybe Apple's VR headset will be a 3D renaissance.
The main reason at home 3D failed is because most people don't watch at home like they do at a theater.
At a theater you sit down knowing that you can't get up and leave until it's over. At home you are doing other things: eating, folding laundry, going to the bathroom, taking phone calls, answering the door, and so on. It's not conducive to wearing glasses.
Vision will have the same problem (as does any at home headset). I don't think it will lead to a 3D renaissance, at least not for a long time, until it becomes acceptable (and feasible) to walk around with it on all the time.
Otherwise we need to wait for holographic projectors that can make a 3D image without having to wear glasses that make it hard or impossible to look at other 3D objects.
I think that would be a problem with VR headsets, not these 3D glasses you could put on or take off in seconds?
My parents had a 3D TV. It was a huge problem. You probably don't realize how often you look away from your TV when you're watching your TV.
While watching a movie, look away - maybe. Get up and walk around and do chores - we always pause if needed. I think it's a matter of establishing that to watch a movie, one needs to set aside time and commit to focus just on it, but then it becomes yet another barrier.
Different story for TV shows which often are background though.
Yeah that's my point -- most people don't watch movies at home that way. They are just background distractions, even movies.
> My parents had a 3D TV. It was a huge problem.
A huge problem if you're not actually watching the movie, sure. But if you're doing something else, don't use the 3D mode.
That's my point -- most people don't actually watch the movie at home. That's why 3D TV failed.
I don't think that's why. TVs without 3D are just cheaper, the early 3D tech just wasn't very good and took awhile to mature thus souring the market, 3D content was more expensive (or an extra expense, eg. buy the 3D and non-3D versions of a movie) and so people just went for the cheaper options overall. I've had an active 3D TV for 10+ years, and the 3D has not itself been a problem when I've watched with others.
The only time it's a problem is if someone currently experience a migraine is trying to watch, then they can get serious vertigo, but that's an issue caused by the migraine itself (visual auras and vertigo generally).
All of those reasons certainly contributed, but the reality is that most people don't watch movies at home the way they watch in a theater, where they dedicate 2+ hours to the experience with no distractions.
I do that, but I have to wait for everyone to go to bed first and then turn off my cell phone. Most people aren't willing to do that.
A lot of people wrote it off as unnecessary gimmick, I'd add that to list of reasons. VR 3D blows it out of water but then requires more effort to use.
Possibly to some degree. They're doing crazy things with NeRF.
So now we need to stop making fun of cops pressing the "enhance" button in films...
We're going to have at least one episode of those lawyer shows where they pressed enhance, and the neural network hallucinated something that wasn't there.
The work I am interested in this broader domain is conversion (say, via some NeRF) of existing standard video into spatial video e.g. MV-HEVC for immersive experience on the Vision Pro etc.
This stuff is sick. If we had a real-time upscaler on a zoom telescope it would be a fantastic tool while traveling. I'd get a kick out of that.
And what would fake detail in the real world give to you?
it doesn't have to be "fake" detail, an AI can use multiple frames to gather much more information than is available in a single frame and composite them into a much more detailed image
This makes sense, it is called super resolution, but you don't need any AI for that, but I see that AI companies are trying to hijack the term.
I can pretty easily distinguish useful LLM output from non-useful LLM output though others on this website seem to have lots of trouble. I think I can pretty easily do the same for things in the visual field. To be honest, part of why I'm successful is that I can draw use out of imperfect tools.
Oh boy. Fake detail is well, fake. It doesn't allow you to learn anything about the world beyond self-deception. But of course, if this is what rocks your boat then be my guest.
Skill issue.
Traveling to a real destination so that you can look at fake AI generated crap on a screen instead of the actual surroundings.
The upscaling doesn't seem particularly convincing, however, the HL2 RTX video on the same page definitely is.
HL2 RTX also has newer textures and other assets
Feels like a misnomer, its really "HDR style" video. The source material does not have the dynamic range embedded, this is an effect filter.
> Using the power of Tensor Cores on GeForce RTX GPUs, RTX Video HDR allows gamers and creators to maximize their HDR panel’s ability to display vivid, dynamic colors, preserving intricate details that may be inadvertently lost due to video compression.
There is so much marketing BS in one small paragraph. For starters, generating(/hallucinating) data is imho the opposite of preserving anything. Then HDR is less associated with "intricate details" and more to do with color reproduction. Finally, video compression is the one thing that usually does not have problems with HDR, even the now venerable x264 can handle HDR content, generally it's almost everything else that struggles.
Of course in a true marketing tradition, none of the things are also strictly false. I'm sure there are many ways to weasel the claims.
They claim to preserve color detail that was lost due to compression of the dynamic range. What's wrong with that?
Not the OP, but you have to understand that 'compression of the dynamic range' is an artistic tool. Literally choosing the lighting ratio of an image is how you build out lighting for a scene. With AI overwriting these choices, you're looking at something more akin to colorization than upscaling.
Not really, half the battle with SDR video is tonemapping a high dynamic range to fit into SDR. That process is not artistic, it's a process on trying not to make it look bad.
I think you need to understand that it is not always a 'artistic tool' but a money or knowledge limitation.
I'm a filmmaker... I don't know a single DOP or director who wishes they could work in HD but is limited by finances or knowledge. Again, shaping light is the essence of cinematography. Modern DSLRs far surpass the dynamic range (although not the effective resolution) of 35mm film. And yet the image they produce isn't comparable. When it comes to image quality, bit depth is enormously more important than dynamic range. When it comes to creating an artistic image, dynamic range hasn't been a limit for many decades.
Very informative comment! Could you please tell me what you mean by "effective resolution?" Is it the resolution in px or something to do with dynamic range? (I don't know anything about filmmaking.)
Probably in pixels, good quality 35mm seems to top around "12k" according to some of the people doing scans.
I'm a photographer and filmmmaker. And now?
The world is huge btw.
You can’t preserve something that was lost (but perhaps you can recreate a substitute).