Nerfstudio: A collaboration friendly studio for NeRFs
docs.nerf.studioIt would be helpful if that site didn't just assume all visitors already know what a NeRF is.
exactly, can somebody ELI5 what is NeRF in this context?
At least in the original paper [1], the key idea is that instead of training a neural network to treat "images" as an array of pixels, you train a network to map a camera location and direction (in 3D space) to a distance and a color.
For example, if you tell a NeRF that you have a camera at location (x, y, z) and pointing in direction (g, h, j), then the network will output the distance at which the ray emitted from the camera is expected to "hit" something and the RGB color of whatever is hit.
Doing things in this way enables rendering images at arbitrary resolutions (though rendering can be slow), and is naturally conducive to producing rotated views of objects or exploring 3D space. Also, at least theoretically, it should allow for more "compact" network architectures, as it does not need to output, say, a 512x512x3 image.
A NeRF is a deep neural network architecture, that takes a set of photos from multiple camera positions of a given scene as an input and outputs a volume representation of this scene. Volumes allow for many common volume rendering interactions like repositioning the camera from any angle or even lighting effects.
Like the Blade Runner camera!
Neural Photogrammetry. "Novel View Synthesis"
For those curious like me, NeRFs are Neural Radiance Fields: https://www.matthewtancik.com/nerf
Funny, I just discovered that a few days ago while listening to the engineers talk at Tesla’s AI Day. They mentioned it (I think for their occupancy stuff), I didn’t know what it was and found the same site.
NeRF applied to Maps soon
https://twitter.com/jeffdean/status/1524612819160735745?lang...
another paper toward Earth sized multi scale models (5 minutes)
So a bunch of news sites have reported this feature has already rolled out. For example: https://www.engadget.com/google-maps-aerial-view-landmarks-0...
And lo and behold I see something that looks like a 3d rendered view at the top of the entry for "Tower of London".
Except - it's just a video. A rendered video of a 360 view. Tried on Android and iPad. There's no freedom of movement, it plays a single look and even says "report this video" when you click the 3 dots.
Are the news reports wrong - or do the writers not know a video when they see one?
Just saw this on twitter and the results look pretty awesome https://twitter.com/akanazawa/status/1577686321119645696?s=2...
I tried it today. Nerfstudio is amazing. I also like the pragmatic approach to its UI: launch a command-line / text UI program, publish a localhost URL, access a rich WebUI in the browser.
NeRF rendering latency is extremely low and frames are delivered over WebRTC. Which means that there's no fundamental problem to stream WebUI over the Internet.
Nerfstudio already supports all leading NeRF flavours and also includes their own. It instantly makes a staple for all future NeRF research.
I am extremely excited for this technology to mature to a point where realtime novel view synthesis becomes possible. The idea of being able to take a few photographs and be able to recreate an appreciably decent 6DOF viewing experience is a very powerful enabling technology.
Nerfstudio looks like something that is finally accessible enough that I will be able to start experimenting without having to spend enormous amounts of effort on tooling. Can't wait to check it out! It looks like some of the processing pipelines might be a good fit for merging with WebODM to make it even more friendly to work with.
Real time novel view synthesis is already possible. In fact, I recently saw someone demo a version that ran in WebGL2 shaders on smartphones, so it's not terribly taxing.
Thanks very much for the reference. I'm curious about this but unfortunately not enough to keep up with the nooks and crannies where it's being worked on!
> The idea of being able to take a few photographs and be able to recreate an appreciably decent 6DOF viewing experience is a very powerful enabling technology.
My understanding is that you can't just take a few photographs, but you have to label them with their position and orientation, which is often not trivial.
But perhaps this difficulty has been solved already (?)
Usually, you can just run COLMAP ([1]) and it will find the poses and camera intrinsics. This is indeed what Nerfstudio recommends doing in the docs ([2]).
2. https://docs.nerf.studio/en/latest/quickstart/custom_dataset...
GTC 2022 Re-imagining Robot Autonomy with Neural Environment Representations, with Q&A from EMEA region [A41181b] https://register.nvidia.com/flow/nvidia/gtcfall2022/attendee...
I didn’t know what NeRFs were so I had to look it up. This article seems like a good introduction for anyone else that’s out of the loop like me: https://www.matthewtancik.com/nerf
Actually kind of excited by these techs, it would be fun to play CoD in a realistic downtown SF or LA.
Speaking of NeRFs, here’s a Google Imagen powered text to NeRF model:
Good demo video: https://www.youtube.com/watch?v=nSFsugarWzk
If you're going to make an app like this, at least use electron or something similar and create an integrated package instead of relying on a python UI over a bunch of command line tools.
Also how does it compare to Nvidia Instant Nerf (performance etc)?
electron is a literal ecological disaster because of how much CPU and RAM JavaScript runtimes use. How many chunks of coal have been burned just to keep Electron going when a faster language would have used less CPU time for the millions of users who use Electron apps?