Exploring Neural Graphics Primitives
thenumb.atIn the last section on hash tables, I was wondering how on earth they make hash tables work well on the GPU.
The answer:
“What do we do about hash collisions? Nothing—the training process will automatically find an encoding that is robust to collisions.”
Amazing. Makes one wonder what other classic data structures gain new properties when ML is mixed in.
There was some stuff quite recently about using AI (neural networks I think) to help optimise lookups in SQL indexes[1]. I'm afraid I can't find it now.
[1] That caused some rebuttal papers to be published saying "not so fast"
I wonder if some basic pre-processing of the image would help.
Turning RGB into YCbCr is a pretty essential step for achieving efficiency gain in techniques like JPEG. Quantization works a lot better when we can do the luminance and chrominance separately.
If you want the AI to "see" more like a human, converting into this kind of color space is an important first step IMO.
RGB<->YCbCr is a linear operation, so a deep enough network shouldn't have trouble learning to do the transform if it's useful. Gamma correction might be important, though.
Sure, but surely you'd rather not? I imagine you want the network encoding the details of the specific image, not wasting weights to learn about easy wins that apply to every image. You could burn a whole layer converting to YCbCr, and the net in the article only had 3.
You don't know if that is actually the case, the whole appeal, originally, is that you don't need this kind of handcrafting of features. Now, clearly, people preprocess their data all the time so that point is kind of moot.
However, if YCbCr was meaningfully better for the network, I think it would be pretty well-known as a preprocessing step. Also, the actual sensor data is RGB.
But the human visual system is insensitive to high resolution yb and red/green information so you can essentially down sample those layers to almost nothing before you even start. That's a fundamental trick which is also used in another way by jpeg but presumably not by this RGB algorithm.
If you extend the JPEG analogy all the way and consider the 4:2:0 subsampling mode, then you could reduce your input parameter counts by exactly 50% by consuming a pre-converted image.
A great summary of this line of research! I think it should have the (2022) label.