esc
Start typing to search the docs
Navigate Open

The Data Layer for Physical AI

The data primitives to build, understand, and improve your data loop. Designed for multi-rate, multimodal data, from the first recording to massive scale.

10K+

GitHub stars

3 supported languages

End-to-end

Collection to training

End-to-end learning
needs end-to-end data

End-to-end models learn from multi-rate, multimodal sequences. So every stage of the loop works on that same data: collect, refine, train, deploy. Rerun is one layer underneath all of it.

Experiment loop

Teams win by iterating fast on data composition and modeling while scaling data and compute.

Collect

Teleop, fleet logs, human-data rigs, sim, and web video, all in one format.

Refine

Add columns and run CV transforms. Curate with dataframes or SQL.

Train

Stream dataset mixes straight to GPUs. No export jobs, no idle waiting.

Deploy

Evaluate rollouts and trace failures back to the data that caused them.

Rerun SDK

Open source SDK that makes it easy to build the applications and compute you need to create physical intelligence.

df.select(...).where(...)

for batch in dataloader:

Rerun Hub

Data catalog and backend for large-scale storage, access, and streaming from object storage. Makes Rerun SDK scale.

Two parts. One data layer.

The Rerun SDK is the open-source toolchain physical AI teams build on. Rerun Hub is the production backend that connects it to your data at scale.

Open source

Rerun SDK

Open-source SDK for logging, storing, querying, visualizing, and training on multi-rate, multimodal data. Includes viewer, query library, visualization framework, CLI, and the file format the data layer is built on.

pip install rerun-sdk
rerun
  • pip install rerun-sdk
  • Interactive viewer + visualization framework
  • Dataframe query library built in
  • Multi-rate, multimodal, spatial native
  • Python, Rust, and C++ APIs
  • Open source. Start in two minutes.

Commercial

Rerun Hub

The production backend for the Rerun data layer. Catalog, byte-range indexing, and retrieval that turns your object stores into a queryable, streamable foundation. Run transforms on the edge or close to the data.

  • Query across multiple object stores
  • Byte-range indexing for fast lookups
  • Stream dataset mixes directly to training
  • Catalog and metadata management
  • Deployed to your preferred cloud region

How they fit together

You log, query, visualize, transform, and train with the same open-source SDK and the same APIs either way. Hub doesn't change what you do. It changes what runs underneath.

Your data
Rerun SDK Local .rrd files on one machine
Rerun Hub Your object storage, streamed at petabyte scale
The catalog
Rerun SDK In-process, for local work
Rerun Hub Persistent and managed, indexed across your object stores
Access
Rerun SDK Just you
Rerun Hub Your team, with SSO and authenticated share links
Running it
Rerun SDK pip install, self-hosted
Rerun Hub Run for you, single-tenant, in your cloud region of choice

Iterate faster on robot learning data

Build, understand, and improve your data loop on one set of primitives, from your first recording to massive scale. Start with the SDK, or talk to us about Hub.