Settings

Theme

Robust Conditional 3D Shape Generation from Casual Captures

facebookresearch.github.io

54 points by lastdong a day ago · 6 comments

Reader

fxtentacle 15 hours ago

This turns point clouds into meshes.

That means it doesn’t need depth. Depth is helpful for getting good point locations, but SLAM on multiple frames should also work.

I’m guessing that they are researching this for AR or robot navigation. Otherwise, the focus on accurately dividing the scene into objects wouldn’t make sense for me.

  • KaiserPro 13 hours ago

    Its much deeper than that.

    Segmentation in 2d is mostly a solved problem (segment anything is pretty fucking great) Segmentation in 3d is also fairly well done. You can use dino V2 to do 3d object detection and segmentation.

    The diffcult part _after_ that is interacting with the object. sparse and semi dense point clouds can be generated and refined in real time, but they are point clouds not meshes. this means that interacting with the object accurately is super hard, because its not a simple mesh that can be tested/interacted with. its a bunch of points around the edges.

    Where this is useful is it allows you to generate a mostly plausible simple 3d model that can act as a standin for any further interactions. In VR you can use it as a collision object for physics. For robotics you can use it to plan interactions (ie place objects on the table)

    Its also a step in the direction of answering "who's" object it is, rather than "what" the object is. Who's water bottle is much much harder to answer with machines (without markers) than "is this a water bottle" or "where is the water bottle in this scene"

nico 20 hours ago

Does this need depth data capture as well? The “casual captures” makes it seem like it only needs images, but apparently they are using depth data as well

Also, can it run on Apple silicon?

  • KaiserPro 17 hours ago

    Nope, only needs depth for ground truth.

    its designed to be run on top of a SLAM system that outputs a sparse point cloud.

    on page 4 on the top right you can see how the point cloud is used to then feed into the object generator: https://cdn.jsdelivr.net/gh/facebookresearch/ShapeR@main/res...

  • lastdongOP 17 hours ago

    I think it does use depth data from parameters in docs: python infer_shape.py --input_pkl <sample.pkl> (possibly achievable using software like MapAnything). I believe CUDA only.

    • efskap 15 hours ago

      Yeah they confirm that at the bottom of the linked page

      > Furthermore, by leveraging tools like MapAnything to generate metric points, ShapeR can even produce metric 3D shapes from monocular images without retraining.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection