This page walks through streaming Rerun recordings directly into a PyTorch DataLoader, without an intermediate export step, using the bundled LeRobot ACT training example end-to-end.
For an explanation of the dataloader API itself — windowed action chunks, GOP-aware video decoding, DDP partitioning — see Train PyTorch models with Rerun.
Run the example
The example trains a LeRobot ACT policy on the rerun/so101-pick-and-place dataset from HuggingFace.
1. Grab the example
Sparse-checkout just the example directory, without the rest of the Rerun repo:
git clone --filter=blob:none --sparse https://github.com/rerun-io/rerun.git
cd rerun
git sparse-checkout set examples/python/dataloader
cd examples/python/dataloader2. Install
The example has its own uv project because LeRobot pins an incompatible rerun-sdk.
The additional arguments to uv sync allow you to run just this example without the full rerun repo setup.
uv sync --no-sources --no-devIf you have the full Rerun monorepo checked out and want to develop against your local Rerun build, run instead:
RERUN_ALLOW_MISSING_BIN=1 uv sync
uv pip install ../../../rerun_py/rerun_dev_fixup3. Start a catalog server
In a separate terminal:
rerun server4. Prepare and register the dataset
Downloads the dataset from HuggingFace, splits it into per-episode RRDs, and registers them with the catalog:
uv run python prepare_dataset.py5. Train
uv run python train.pyThe script streams batches from the catalog, trains an ACT policy for a few epochs, and saves a checkpoint to act_checkpoint/.
References
- Example source —
prepare_dataset.pyandtrain.py rerun/so101-pick-and-place— LeRobot dataset on HuggingFace- Train PyTorch models with Rerun — full how-to: windowing, video decoding, iterable vs. map style, DDP