Recent advances in 3D content understanding
ai.facebook.comKinda boggles my mind that there hasn't been a stronger push for hardware support of 3D imaging in phones. It clearly provides more useful information for analysis. Even when we look at an image in a photo we map it to a 3D projection in our mind.
Think about the data explosion when all our photos and video store the whole light field.
A 2.5D model is probably less data?
There have been a few phones which provided that, notably the HTC Evo 3D. It had dual cameras and an autostereoscopic (glasses-free 3D) display.
The value provided was, IMO, quite minimal. It was not easier to use or better in any user-facing discernible way, the 3D stuff just felt like a gimmick. 3D photos taken by the phone and viewed on its screen did not feel more lifelike. The color depth and image quality was poor even by the standards of other phones of its era.
While it was very cool and felt very futuristic, it did not feel worth the cost.
Not sure if this qualifies, but an interesting development which was new to me not so long ago was discovering the iPhone X has the ability to measure things with the camera. I.e. the measure app allows you to designate a point in the scene, draw a line and add another point and it will measure the distance. Also has nice snapping features and the ability to take a pic of the scene with the measurements superimposed on top.
Phones with multiple camera are starting to become common. Time of flight sensors that use projected infrared patterns aren't trivial to out in a phone so the demand needs to be there. Still, there have been tablets that have integrated Intel's depth cameras.
Nit: Projected patterns are structured light, not time-of-flight. As far as I’m aware (would love to be wrong!) you can’t do ToF with a traditional CCD or CMOS sensor and resolution is invariably woeful.
What kind of sensors do they use in time-of-flight? Because I think they do use gated CCD or CMOS...?
Hm, last time I checked I thought the sensors were individual diodes but it looks like you're right, and 'flash LIDAR' cameras do use some funky kind of CCD. Still relatively low resolution but good to know, thanks!
The videos showing the algorithm in practice are really nice demos.
I’m curious how big of a step forward this is from the previous state of the art, and at what computational cost.
Also curious if the technique scales well with multiple cameras with overlapping fields of view. That is to say, I assume accuracy can be increase through sensor fusion in the basic sense of averaging errors, but actually molding a cohesive 3D view of a 360° environment and understanding that an object at the end of one frame is the same object from a different perspective at the end of another camera frame.
Obviously this seems like it should be extremely useful for AutoPilot. Compared to the relative inaccuracy of the positional information of adjacent cars on the AutoPilot guidance display that we have today this seems like a big step forward.
I think it’s interesting how the RNN is identifying specific types of objects and then depth mapping them. I assume it can’t just depth map the whole image without that first classification step? I’m thinking like for the Smart Summon application where depth mapping everything around you is pretty crucial and obviously not entirely working at this point.
I do photogrammetry as a hobby, and would love to see more RGBD cameras on the market. I've even considered hacking together my own... anyone have pointers for cost-conscious options?
A few of the models I've shrunk down and posted: https://sketchfab.com/darkphibre
The new Azure Kinect is really impressive.
I've used the og Kinect, Kinect v2, and intel realsense d435 and it was much more accurate then all of those.
Sorry for derailing the thread, but how was the new Azure Kinect's performance outdoors? I've had trouble finding this information anywhere.
np, I have yet to test it outdoors.
Very impressive research. Just scared what invasive thing Facebook will use it for.
I hope the researchers are advocating for its ethical use.
It really seems like we should be close to having robots that can navigate a room.
You can get pretty far building one yourself using the out of the box ROS (https://www.ros.org/) navigation stack with a kinect, off the shelf motor controller, and something like an nvidia jetson