Rerun Hub

Infrastructure that powers
your data loop

The production backend for the Rerun data layer. Catalog, byte-range indexing, and retrieval that turns your object stores into a queryable, streamable foundation. Run transforms on the edge or close to the data.

Book a meeting

Petabyte- scale

Across your object storage, any bucket or region

Direct streaming

Byte-range reads, straight from your storage to compute

Built for physical data

Multi-rate, multimodal data, kept in its native shape

Hub sits under your whole loop

Build your loop your way with the open-source SDK. Hub handles the hard parts: consistency and scale for physical data.

Experiment loop

Teams win by iterating fast on data composition and modeling while scaling data and compute.

Collect

Refine

Train

Deploy

Enabled by Rerun SDK

The same open-source SDK drives every stage, on your own compute and tools. It connects to Hub over an open protocol.

Visualization

SQL / dataframe queries

Post-processing

GPU training

Rerun Hub

The catalog, schema management, byte-range indexing, and streaming that keep physical data consistent and queryable at scale.

Rerun Hub

Object storage

RRD

Everything Hub does with your data

Four capabilities on one catalog over your object storage, already handling petabytes of robot data today.

Query

Query into your recordings with SQL

Run any SQL or dataframe query across your catalog, down into the columns, time ranges, and values inside your recordings, not just their metadata.

Transform

Refine your data without copies

Add derived columns and evolve schemas without breaking history. You run the transforms with the SDK; Hub keeps the derived data and your raw recordings organized together.

Train

Train without an export step

Express a dataset mix as a query and stream it to your GPUs. The dataloader is column-aware and video-codec-aware, so you train directly on your recordings.

Everyone works from the same data

One viewer, the same recordings, shared across the team. Explore, annotate, and trace a failure back to the data that caused it.

Your region. Your storage.

Single-tenant isolation

Your own isolated Hub deployment, run for you in the cloud region you choose.

Your data stays in your buckets

Any S3-compatible bucket, across regions. You decide where it lives.

Enterprise-ready with broad SSO support, self-managed storage options, and all the controls your security team expects. Designed for low friction at any scale: from empowering small research teams to facilitating secure cross-org data sharing.

See Hub work on real robot data

The quickest way to get Hub is to watch it. In five minutes, Nick connects a notebook to Hub and explores the DROID dataset, streaming just the slices he needs into the viewer and querying with plain SQL to surface the recordings that matter. The same query then scales untouched from a handful of clips to 70,000 across terabytes, with no export step in between.

Ready to scale your data layer?

Book a meeting to see Hub against your stack: your data, your storage, your training cluster.

Book a meeting Explore the SDK

Infrastructure that powersyour data loop

Experiment loop

Enabled by Rerun SDK

Rerun Hub

Query into your recordings with SQL

Refine your data without copies

Train without an export step

Everyone works from the same data

Single-tenant isolation

Your data stays in your buckets

See Hub work on real robot data

Ready to scale your data layer?

Infrastructure that powers
your data loop