Single image 3D reconstruction using MCC, SAM, and ZoeDepth

Python source

This example project combines several popular computer vision methods and uses Rerun to visualize the results and how the pieces fit together.

Visual project walkthrough visual-project-walkthrough

By combining MetaAI's Segment Anything Model (SAM) and Multiview Compressive Coding (MCC) we can get a 3D object from a single image.

The basic idea is to use SAM to create a generic object mask so we can exclude the background.

The next step is to generate a depth image. Here we use the awesome ZoeDepth to get realistic depth from the color image.

With depth, color, and an object mask we have everything needed to create a colored point cloud of the object from a single view

MCC encodes the colored points and then creates a reconstruction by sweeping through the volume, querying the network for occupancy and color at each point.

This is a really great example of how a lot of cool solutions are built these days; by stringing together more targeted pre-trained models. The details of the three building blocks can be found in the respective papers:

Segment Anything by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick
Multiview Compressive Coding for 3D Reconstruction by Chao-Yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichtenhofer, and Georgia Gkioxari
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth by Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, and Matthias Müller