Refine: register, enrich, query
One converted file is a demo. A folder of them is a dataset, and once it's registered you stop opening files one at a time and start asking questions of all of them at once.
Register once, open consistently
The demo's catalog.py step registers every recording with a catalog (pixi run serve starts a local one; the same code points at Rerun Hub unchanged once your data outgrows the laptop) and registers a default blueprint with the dataset.
That's why every episode (yours, a teammate's, one from three months ago) opens with the same dashboard instead of an arbitrary heuristic viewport.
The catalog tracks each recording as a segment you can list, filter, view, and query.
Enrich with layers, never mutating raw data
Refining means adding derived data: a per-frame blur score, operator metadata, model outputs, quality verdicts. In Rerun these attach as layers: new data carrying the same recording id as their source, registered under a layer name. The raw recording is never modified, and a bad pass is fixed by re-registering. Anything you compute in the future lands the same way.
Query the whole dataset
With the dataset registered, questions become single queries instead of ten file-opens.
The demo's notebooks/ show two flavors over the same data, side by side:
- the DataFrame API (Python), in
analyze_dataframe.ipynb: e.g. rank operators by success rate, or read a joint's value at each episode's final timestamp across the dataset; - SQL, in
analyze_sql.ipynb: the same questions for people who'd rather write a query directly.
Results can be written back as tables that persist next to the dataset and show up in the viewer, so analysis becomes part of the data layer, not a notebook artifact someone loses. The query how-tos live in Query & transform docs.
Explore and extend
- In the embedded viewer, pick a robot link and scrub, then imagine asking "across every episode, how often does this joint exceed its velocity limit?" That's one query.
- Open the demo's
notebooks/(analyze_dataframe.ipynbandanalyze_sql.ipynb) and read the DataFrame and SQL versions of the same question side by side.
Next
The dataset can now describe its own quality. Train: filter it by a derived signal and stream the survivors straight into training. The training set is a query.