Segment Anything 2 is follow up work on Segment Anything, that extends the state of the art segmentation capabilities into videos. This is done by adding a per session memory module that captures information about the target object in the video. This allows SAM 2 to track the selected object throughout all video frames, even if the object temporarily disappears from view, as the model has context of the object from previous frames. Depth Anything 2 is a monocular depth estimation model trained on a large amount of synthetic data and real data to achieve state of the art depth estimation. The two models are combined to allow tracking an object in 3D from just a single monocular video!
This is an external example. Check the repository for more information.
You can try the example on HuggingFace space here.
It is highly recommended to run this example locally by cloning the above repo and running (make sure you have Pixi installed):
git clone https://github.com/pablovela5620/sam2-depthanything.git
pixi run app