4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

guanjunwu.github.io

327 points by lattalayta 2 years ago · 87 comments

Reader

arijun 2 years ago

This looks great! The main potential use for this must be for VR video with 6 degrees of freedom. What they have now does an incredible job of conveying space, but feels a bit limiting when your view doesn’t translate with you.

lnyan 2 years ago

This is bad news for me. I am working on a simliar project (gaussian splatting + dynamic scene). Our method is different with the mentioned 4D gaussian splatting, but I am unsure shall I continue or not.

wildpeaks 2 years ago

Please continue working on it: being first doesn't imply being the best, all research is iterative.
JHonaker 2 years ago

There's nothing wrong with concurrently developing something similar. I can almost guarantee there will be something different enough about what you've developed to be considered novel (if you care about publication). If you don't care about publication, then definitely keep going! Hell people are still argue about which implementation of complete specifications are the best (for good reason).
andybak 2 years ago

Why is that bad news? If you're interested in the outcome - someone has saved you work.
If you're interested in the process - or exploring your specific approach then why stop?
- ShamelessC 2 years ago
  
  Because of the sunk cost fallacy which is a fallacy only if you ignore the emotions and trauma of having to abandon work you identify with or is responsible for your self esteem.
  - SiempreViernes 2 years ago
    
    Don't see how the sunk cost has anything to do with it, OP clearly hoped to be first, as being first would almost guarantee a payoff of some form.
    Coming out with a method second is much less likely to be rewarded by the community: not being rewarded for work completed is not sunk cost, its just a straight up loss.
sheepscreek 2 years ago

If it’s a different method, it is definitely worth sharing.
The additional interest might actually be helpful.

jayd16 2 years ago

Does anyone know if the pixel overdraw of the GS scene is consistent from every view angle? I'm asking because I would assume there is inconsistent GS density but the paper doesn't give a range of FPS measurements or 99th percentile or anything like that.

kookamamie 2 years ago

I'm pretty certain it is not - consider surfaces seen in steep angles, vs. ones seen perpendicularly. If we assume no culling or pruning occurs for the splats, steep angles result in way more overdraw.
- jayd16 2 years ago
  
  This doesn't necessarily follow. If the splats are coplanar to the surface they are representing then viewing them at an angle wouldn't change how they overlap. But that said, I assume you're right.

reactordev 2 years ago

This gives me hope that one day we'll have a holodeck. Holy crap! The applications for this are pretty broad. From safety (scene reconstruction from video sources) to real-estate, to hollywood and video games. I'm just blown away. Will we eventually see 4D GS AR/XR scenes we can walk about? I feel like that would make the perfect VR sherlock holmes game.

jayd16 2 years ago

Why wouldn't you be able to walk about now? They already have examples with free camera movement. To make it an XR scene, you just need to render two cameras and pipe it into a headset.
waynecochran 2 years ago

One more step towards the next simulation level.
croes 2 years ago

Aren't the scenes static?

nialv7 2 years ago

Holy heck this is going to fundamentally change media production

heurist 2 years ago

After reconstruction, is there any way to scan for a particular condition in the model, and map it onto the 3D structure? For instance, find the broken cookie, or find a surface matching some input image.

andyferris 2 years ago

I suspect typical point-cloud feature extraction techniques would work. Things like identify planar regions, from that join connecting planar regions into clusters, etc.
The time component is super interesting here though!
dwallin 2 years ago

Seem fairly tractable to use Segment Anything or a similar method to derive plausible semantic clusters of splats.

mortenjorck 2 years ago

Hard to believe the original Gaussian Splatting paper is still less than three months old, given the explosion of new techniques in recent weeks. It's wild to see the state of the art in environment and object capture suddenly advancing this quickly – beyond the obvious applications like real estate, I wonder what else GS will end up transforming.

bane 2 years ago

To be at risk of an "AcKchYuALly". Gaussian Splatting has been around since at least the early 90s. There's even a few old games made with the technique.
The paper I think your referring to made the interesting leap that a 3d radiance field could be rerendered out as a field of Gaussian splats, and that this would probably run faster in modern GPU pipelines for real-time performance. It looks like they also have the nice property of being able to be shifted around in memory quickly hence the animation property seen here.
- blovescoffee 2 years ago
  
  If you want to be pedantic, the paper made the leap it did because of differentiable rendering which necessarily needs a differentiable representation of primitives - so they use Gaussians. It’s entirely novel and set in a nascent field (neural rendering). Gaussians happen to be further representable as easily rasterized primitives. Though some considerable work was put into making this performant. Everyone who keeps saying this has been around since the 90s is missing the context of the very modern differentiable rendering literature.
- kelseyfrog 2 years ago
  
  The point of Gaussian Splatting for me is that it is a learned representation. It's odd that others view it primarily as a drawing sprites.
  I'm curious, would you classify particle effects drawn with quads as 4D gaussian splatting too?
  - bane 2 years ago
    
    Well, in the old days, you just put the splats in your 3d space, they weren't really sprites (in the strict sense that they didn't use dedicated sprite hardware). They're really interesting thing is that they're being used here to render the learned representation, but there's nothing particularly special about them or new or AI/ML about them.
    You could "model" 3d objects with the gaussians by just putting a bunch together. It was a way to produce fast rendering 3d images without using a bunch of polygons. The results back then were...left behind by other techniques.
    There's a massive back catalog of computer graphics work on the technique, it's usually just easiest to use the search tools and search back for all dates leading up to say...2021 and you'll find tons of normal old stuff like CS 302 - Computer Graphics courseware slides or whatever on the technique.
    https://www.google.com/search?q=gaussian+splat+-site%3Apinte...
    
    ezconnect 2 years ago
    
    Being old and seeing the new generations amazed by the reapplication of what was discovered and used decades ago in a novel way amazes me.
- constantlm 2 years ago
  
  Could you point us to some examples of old games using this technique? Would be awesome to see.
  - doormatt 2 years ago
    
    Ecstatica - https://www.youtube.com/watch?v=dnOXk3QJWN8
    
    drdeca 2 years ago
    
    Is this really splatting gaussians? Or is it rendering ellipsoids?
    
    corysama 2 years ago
    
    It was just ellipsoids. I don’t know if any game specifically used Gaussians. But, the idea of splatting points, Gaussians, ellipsoids and a variety of other shapes has been around for at least 20 years.
    The novelty of the paper was in using the differentiability of Gaussians to enable fitting splats to incrementally match all of the target photos simultaneously. So, it’s a new way to generate the splats from photos rather than modeling them by hand.
    
    scheeseman486 2 years ago
    
    PlayStation Dreams used a very similar technique.
    
    gcr 2 years ago
    
    The backgrounds are static and prerendered! There's one sphere .bmp that's scaled and stretched. It comes with a depth offset map that populates a simple z-buffer to prevent overdraw. So rendering each frame becomes just a couple hundred dozen operations!
- verytrivial 2 years ago
  
  I remember a 4k demo that used translucent triangles (I think? my brain is showing me circles, so perhaps a fixed set of sizes and fast blit with alpha.) This created moving volumetric light and shadows around some geometric shapes, some pillars I think. Very smeary/ghostly with overdrawn shapes, but the effect was startling given it was on a 486. It didn't render full frames, but moved the model and just kept splatting.
- krasin 2 years ago
  
  Interesting! Can you please name some of these old games made with Gaussian Splatting? I would be interested to play, to get a sense why polygons won in that round (and likely to lose in this one).
  - chaboud 2 years ago
    
    I used additive gaussian fields (restricted by bounding regions) for this back in the late 90's for audio visualizations in a ripper/player called "Siren" (back when we actually thought we could charge money for something like that).
    The technique worked well on non-accelerated (CPU only) hardware of the era, with the additive approach saving the pain of needing to keep a z buffer or fragment list.
    Gaussian voxel reconstruction is useful in medical and GIS settings, which, if memory serves, is where Kyle Freeman from Novalogic drew on for his work on Comanche. As far as I know, that was the first commercial game with voxel rendering... It's been a bit since I played it, but the swimming jaggies make me think that it was Manhattan distance height map offset by planar traversal (kinda like Doom raycasting) or some similar trick. I don't recall any intersections or overhangs, but, to be fair, I was a middle schooler when Comanche came out.
    It also ran fine on my weak sauce PC.
    Once acceleration hit, transformation of triangles with fixed-function pipelines took over. The ability to push textured triangles with minimal per-pixel value adjustment took over. Slowly but surely we've swing back to high ALU balance (albeit via massive stream parallelism). We've shifted from heavy list/vertex transformers to giant array multiply/add processors.
    It's a pretty great time to be a processing nerd.
  - otteromkram 2 years ago
    
    From another user:
    https://www.youtube.com/watch?v=dnOXk3QJWN8
atonalfreerider 2 years ago

This one was released today as well. Works out of the box: https://github.com/JonathonLuiten/Dynamic3DGaussians
jacobgorm 2 years ago

We used to call this technique "vector balls" on the Amiga, here is one famous example: https://www.youtube.com/watch?v=gjKkUTlhIek . I remember implementing it myself for an unreleased demo.
I realize that a lot has happened since, but this is likely where it all started :)

syntaxing 2 years ago

Does anyone have a video or post that explains the optimization part for the original paper? I understand most of it but that part and can’t seem to wrap my head around it.

magicalhippo 2 years ago

Just glossed over the paper but it seems, in principle, simple enough (though rather brilliant IMHO).
Essentially they're doing what you do when you train a neural network, only that instead of adjusting weights connecting "neurons", you adjust the shape and position of gaussians, and the coefficients of spherical harmonics for the colors.
This requires the rendering step to be differentiable, so that you can back-propagate the error between the rendering and the ground-truth image.
The next key step is to every N iterations adjust the number of gaussians. Either fill in details by cloning a gaussian in an area which is undercovered, or split a gaussian in an area which is overcovered.
They use the gradient of the view-space position to determine if more detail is needed, ie those gaussians which the optimizer wants to move significantly over the screen seems to be in a region with not enough detail.
They then use the covariance of the gaussians to determine to split or to clone. Gaussians with large variance get split, the others cloned.
They also remove gaussians which are almost entirely transparent, no point in keeping those around.
That's my understanding at least, after a first time gloss-through.
- webdood90 2 years ago
  
  You:
  > Essentially they're doing what you do when you train a neural network, only that instead of adjusting weights connecting "neurons", you adjust the shape and position of gaussians, and the coefficients of spherical harmonics for the colors.
  My brain:
  > They're providing inverse reactive current to generate unilateral phase detractors, automatically synchronizing cardinal gram meters.
  - magicalhippo 2 years ago
    
    Heh. For those that haven't dabbled much with neural nets, the key aspect here is the backpropagation[1]. If you want to optimize a process, you typically change the parameters (turn a knob or change a number) and see how the output reacts. If it changed too much you reduce the parameter etc. This is a forwards process.
    The idea in backpropogation is instead to mathematically relate a change in output to a change in the parameters. You figure out how much you need to change the parameters to change the output a desired amount. Hence the "back" in the name, since you want to control the output, "steering" it in the direction you want, and to do so you go backwards through the process to figure out how much you need to change the parameters.
    Instead of "if I turn the knob 15 degrees the temperature goes up 20 degrees", you want "in order to increase the temperature 20 degrees the knob must be turned 15 degrees".
    By comparing the output with a reference, you get how much the output needs to change to match the reference, and by using the backpropagation technique you can then relate that to how much you need to change the parameters.
    In neural nets the parameters are the so-called weights of the connections between the layers in the model. However the idea is quite general so here they've applied it to optimizing the size, shape, position and color of (gaussian) blobs, which when rendered on top of each other blend to form an image.
    Changing a blobs position say might make it better for one pixel but worse for another. So instead of doing a big change in parameters, you do small iterative steps. This is the so-called training phase. Over time the hope is that the output error decreases steadily.
    edit: while backpropagation is quite general as such, as I alluded to earlier, it does require that the operation behaves sufficiently nice, so to speak. That's one reason for using gaussians over say spheres. Gaussians have nice smooth properties. Spheres have an edge, the surface, which introduces a sudden change. Backpropagation works best with smooth changes.
    [1]: https://en.wikipedia.org/wiki/Backpropagation
    
    magicalhippo 2 years ago
    
    Just to add some detail regarding the "blob optimization" phase.
    The algorithm that recovers the camera positions from the reference images also gives you a sparse cloud of points (it places the pixels from the image in 3D space). Use that as the center of the initial blobs, and give each blob an initial size. This is almost certainly not enough detail, but a start.
    Then you run the "training" for a while, optimizing the position and shape of the blobs. Then you try to optimize the number of blobs. The key aspect here is to determine where more detail is needed.
    In order to do so they exploit that they already have derivatives of several properties, including screen position of each blob. If the previous training pass tries to move a given blob a significant distance on the screen, then they take that as a signal that the backpropagation is struggling to cover an area.
    They then decide to split the blobs either by duplication or by splitting, depending on if the blob is large or not.
    If it's small they assume there's detail it can't fill in, and duplicate the blob and move the new blob slightly in the direction it wanted to move the source blob so they don't overlap exactly.
    If the blob is large they assume the detail is too fine and is overcovered by the blob, hence they split it up, calculating the properties of the new blobs so that they best cover the volume the source blob covered.
    This process of training followed by blob optimization is repeated until the error is low enough or doesn't change enough, suggesting it converged or a failure to converge respectively.
    
    flakes 2 years ago
    
    Thank you. This was much more approachable for someone like myself that has little background (a few undergrad courses) in both machine learning and computer vision concepts.
    
    ewngzen 2 years ago
    
    I was just about to ask why not use a sphere? since it could be thought of as a nn, it will be into NN someday. guess the splitting and merge can be compared with dropout then.
    
    magicalhippo 2 years ago
    
    I'm no expert, but my immediate thoughts are that evaluating a gaussian blob is very simple, it's just an exponential of a distance. The edge of a sphere makes it more complicated to compute, hence slower.
    For backpropagation, the differentials of a gaussian is smooth while it's not for a sphere, again because of the edge.
    Now, if you want to use a sphere you probably will do something like using an opacity falloff similar to ReLU[1], making it transparent at the edge.
    This should make smooth enough as such I guess, but I imagine you still have the more complicated rendering. Though I may be mistaken.
    [1]: https://en.wikipedia.org/wiki/Rectifier_(neural_networks)
  - pests 2 years ago
    
    I still continue to read comments like those though - there is a chance I might make sense of a word! But I did find myself laughing as I read the original post thinking about how this sounds like a word salad.
  - blovescoffee 2 years ago
    
    The object that’s being optimized are the parameters of a 3D Gaussian, just imagine a blob changing shape. That’s optimized instead of optimizing a neural network
blovescoffee 2 years ago

What parts confuse you? There are a few steps in optimization. There are lots of papers on differentiable rendering, but the pruining of gaussians and the actual treatment of gaussians, I don't think there's a blog post.

shultays 2 years ago

Can someone help me understand what this is actually doing?

gusfoo 2 years ago

After the scene is filmed/photographed then one can re-position and re-point a virtual camera and have it correctly render the scene. And do so with higher quality results than photogrammetry and NeRF techniques.
- shultays 2 years ago
  
  Thanks!

raytopia 2 years ago

With tech like this I'm starting to wonder if realistic games are going to become normalized and what will happen as a result.

Also has anyone been working on solving the "blurry" look these splats have up close?

TheRoque 2 years ago

But if I'm not mistaken, this technique still requires to get a ton of pictures from many angles ? It's fine for visiting an apartment or watching a cooking video in 3D, but how possibly can you apply this to a videogame that has much more degrees of freedom ? Are you gonna scan an entire city with a drone to create a GTA-like ?
- Nevermark 2 years ago
  
  > We introduce the 4D Gaussian Splatting (4D-GS) to achieve real-time dynamic scene rendering while also enjoying high training and storage efficiency.
  This seems to be a rendering efficiency innovation, not particular to scanning.
  That means it applies to artificially generated environments, whether photo realistic or stylized, and whether based on a real environment or a completely fictional one.
  But of course, any photorealistic, extremely faithful to the smallest detail, rendering of a real place is going to involve a lot of scanning. That is true for any kind of rendering.
andyferris 2 years ago

Each Gaussian "splat" is literally a little blurry blob. The way to make it sharper is to increase the resolution - i.e. increase the number of splats, decrease the size of each one. This increases both training time and render time though.
- andyferris 2 years ago
  
  > realistic games
  That said, games don't have to be super realistic to be fun. E.g. I could imagine a game based on GS at "Minecraft resolution".

jjcm 2 years ago

I'd love to see a machine learning model trained on the resulting data of this. It'd be crazy to see if it can effectively learn and generate realistic looking video as an output.

omneity 2 years ago

Can someone explain to me how is it possible using gaussians to have different reflections based on the angle of view like on the demos? I'm finding it hard to grasp.

jacobgorm 2 years ago

I believe that is due to the use of Spherical Harmonics.
- Aardwolf 2 years ago
  
  That seems more complex to store and render than everything else about a gaussian splat, how are these used efficiently?
  - jacobgorm 2 years ago
    
    There is a bit of explanation here https://aras-p.info/blog/2023/09/05/Gaussian-Splatting-is-pr... (found via Google).

KaoruAoiShiho 2 years ago

Feel like this changes everything, trying it out right now...

cubefox 2 years ago

Interesting that the original publication this is based on (that won the SIGGRAPH 2023 best paper award) didn't get a lot of attention on HN at the time:

https://news.ycombinator.com/item?id=36285374

fennecfoxy 2 years ago

Great video I saw a while ago on this: https://www.youtube.com/watch?v=HVv_IQKlafQ (albeit for 3d, not 4d).

His editing is hilarious too.

Cieric 2 years ago

I've been slowly building my own rendering and training on a non cuda library (trying with vulkan/spirv) I'm curious how many cameras they used here though.

MrTrvp 2 years ago

Reminds me the Deja Vu movie and how they maneuver angles.

petargyurov 2 years ago

Anyone know how well this technique deals with mesh/lattice type structures? For example, fences, ladders, climbing frames, etc.

andybak 2 years ago

Gaussian Splatting in general or this specific approach to animation. Can't comment on the latter but fine detail renders very nicely on still scenes.
https://lumalabs.ai/capture/ed9d985b-9cc1-49e0-a39c-88afa203...
https://lumalabs.ai/capture/83e9aae8-7023-448e-83a6-53ccb377...
https://lumalabs.ai/capture/7f8df9c9-c548-4a47-9892-e945637c...
https://lumalabs.ai/capture/076fcfdc-ea80-4fdc-8159-c9fed831...
- petargyurov 2 years ago
  
  Wow, that's impressive. Thank you for these.

teaearlgraycold 2 years ago

Combine this with state of the art VR tech (something with good eye tracking and 4k per eye) and we're living in the future.

ge96 2 years ago

Wondering when this technique will be used for meal calorie counters

MattRix 2 years ago

Not seeing how that is related?
- ge96 2 years ago
  
  The wobbling made me think of photogramettery/estimating volume with a camera paired with some visual model to detect peas or whatever. Without a concrete dimension though eg. lidar not sure how accurate.
- mnky9800n 2 years ago
  
  That cookie looks delicious

sheepscreek 2 years ago

Can someone also explain the implications of this on gaming?

willis936 2 years ago

Youtube recommended this video to me, which concisely explains splatting. Information necessary for trade is there but the trade is left to the viewer.
The key drawback that isn't highlighted is that you need a physical space to be a close approximation of what you want to render. So if you want to make a few counter strike maps based off of your workplace (not recommended) then this would be a good technology, but if you want to make an open world on an alien planet you're likely better off with traditional rendering.
https://youtu.be/HVv_IQKlafQ

vavooom 2 years ago

This is just incredible technology

xrguy 2 years ago

i like how galaxies look like ellipsoids if you zoom out

astlouis44 2 years ago

Mind blowing stuff.

VikingCoder 2 years ago

Well, Rule 34 is about to happen. And "splatting" is already a decent name...

crtified 2 years ago

Ha! Or Rule 34a, "every sufficiently observed phenomenon, has just become somebody's new fetish".
Although actually, and on a slightly more innocent (but just as edgy!) note, the thing that immediately popped into my head upon reading "4D Gaussian Splatting", was the music from the 1992 Future Crew demo Unreal, and the image of it's inter-scene title screens. ["IYKYK", but basically, that famous old PC demo consists of several short sections, each showcasing a particular coding/graphical technique - each section prefaced by a title screen which named the effect being showcased.]
YT of Unreal demo, as citation for this highly-important observation : https://www.youtube.com/watch?v=InrGJ7C9B3s
- qm9 2 years ago
  
  German demo group Farbrausch pioneered at this.

Settings

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Keyboard Shortcuts