Human pose tracking

Use the MediaPipe Pose Landmark Detection solution to detect and track a human pose in video.

Used Rerun types used-rerun-types

Image, Points2D, Points3D, ClassDescription, AnnotationContext, SegmentationImage

Background background

Human pose tracking is a task in computer vision that focuses on identifying key body locations, analyzing posture, and categorizing movements. At the heart of this technology is a pre-trained machine-learning model to assess the visual input and recognize landmarks on the body in both image coordinates and 3D world coordinates. The use cases and applications of this technology include but are not limited to Human-Computer Interaction, Sports Analysis, Gaming, Virtual Reality, Augmented Reality, Health, etc.

In this example, the MediaPipe Pose Landmark Detection solution was utilized to detect and track human pose landmarks and produces segmentation masks for humans. Rerun was employed to visualize the output of the Mediapipe solution over time to make it easy to analyze the behavior.

Logging and visualizing with Rerun logging-and-visualizing-with-rerun

The visualizations in this example were created with the following Rerun code.

Timelines timelines

For each processed video frame, all data sent to Rerun is associated with the two timelines time and frame_idx.

rr.set_time("time", duration=bgr_frame.time)
rr.set_time("frame_idx", sequence=bgr_frame.idx)

Video video

The input video is logged as a sequence of Image objects to the 'Video' entity.

rr.log(
    "video/rgb",
    rr.Image(rgb).compress(jpeg_quality=75)
)

Segmentation mask segmentation-mask

The segmentation result is logged through a combination of two archetypes. The segmentation image itself is logged as a SegmentationImage and contains the id for each pixel. The color is determined by the AnnotationContext which is logged with static=True as it should apply to the whole sequence.

Label mapping

rr.log(
    "video/mask",
    rr.AnnotationContext(
        [
            rr.AnnotationInfo(id=0, label="Background"),
            rr.AnnotationInfo(id=1, label="Person", color=(0, 0, 0)),
        ]
    ),
    static=True,
)

Segmentation image

rr.log("video/mask", rr.SegmentationImage(binary_segmentation_mask.astype(np.uint8)))

Body pose points body-pose-points

Logging the body pose as a skeleton involves specifying the connectivity of its keypoints (i.e., pose landmarks), extracting the pose landmarks, and logging them as points to Rerun. In this example, both the 2D and 3D estimates from Mediapipe are visualized.

The skeletons are logged through a combination of two archetypes. First, a static ClassDescription is logged, that contains the information which maps keypoint ids to labels and how to connect the keypoints. By defining these connections Rerun will automatically add lines between them. Mediapipe provides the POSE_CONNECTIONS variable which contains the list of (from, to) landmark indices that define the connections. Second, the actual keypoint positions are logged in 2D and 3D as Points2D and Points3D archetypes, respectively.

Label mapping and keypoint connections

rr.log(
    "/",
    rr.AnnotationContext(
        rr.ClassDescription(
            info=rr.AnnotationInfo(id=1, label="Person"),
            keypoint_annotations=[
                rr.AnnotationInfo(id=lm.value, label=lm.name) for lm in mp_pose.PoseLandmark
            ],
            keypoint_connections=mp_pose.POSE_CONNECTIONS,
        )
    ),
    static=True,
)

2D points

rr.log(
    "video/pose/points",
    rr.Points2D(landmark_positions_2d, class_ids=1, keypoint_ids=mp_pose.PoseLandmark)
)

3D points

rr.log(
    "person/pose/points",
    rr.Points3D(landmark_positions_3d, class_ids=1, keypoint_ids=mp_pose.PoseLandmark),
)

Run the code run-the-code

To run this example, make sure you have the Rerun repository checked out and the latest SDK installed:

pip install --upgrade rerun-sdk  # install the latest Rerun SDK
git clone git@github.com:rerun-io/rerun.git  # Clone the repository
cd rerun
git checkout latest  # Check out the commit matching the latest SDK release

Install the necessary libraries specified in the requirements file:

pip install -e examples/python/human_pose_tracking

To experiment with the provided example, simply execute the main Python script:

python -m human_pose_tracking # run the example

If you wish to customize it for various videos, adjust the maximum frames, or explore additional features, use the CLI with the --help option for guidance:

python -m human_pose_tracking --help