Analyze data via Open Source Server

The Rerun Cloud offering builds on the open source core. Towards that end the Open Source Server provides the capability for small scale local analysis using a similar API surface. This supports a workflow to develop or debug locally on a single recording then scale up that same workflow on the cloud for production use.

Open source server

Launching the server launching-the-server

The server needs to be opened in a separate window. Launch the server using the rerun cli.

rerun server

For full details run

rerun server --help

with the most common utility opening a directory of rrds as a dataset in the server

rerun server -d 

Connecting to the server connecting-to-the-server

When launching the server the cli will print out the host and port it is listening on (defaulting to: localhost:51234).

From the viewer from-the-viewer

Either specify the network location with the cli at launch

rerun connect localhost:51234

or after the viewer opens open the command palette select open redap server set the scheme to http and enter the hostname and port.

From the SDK from-the-sdk

import rerun as rr
CATALOG_URL = "rerun+http://localhost:51234"
client = rr.catalog.CatalogClient(CATALOG_URL)

Querying the server querying-the-server

Everything below assumes that the server has been launched and a client has been constructed based on instructions above.

Datasets overview datasets-overview

A dataset is a collection of recordings that can be queried against. If we have already created a dataset we can retrieve it,

dataset = client.get_dataset_entry(name="oss_demo")

otherwise we can create it.

dataset = client.create_dataset(
    name="oss_demo",
)

In order to add additional recordings to a dataset we use the register api.

# For OSS server you must register files local to your machine
# To synchronously register a single recording
dataset.register(f"file://{os.path.abspath('oss_demo.rrd')}")
# To asynchronously register many recordings
timeout_seconds = 100
tasks = dataset.register_batch([f"file://{os.path.abspath('oss_demo.rrd')}"])
tasks.wait(100)

Inspecting datasets inspecting-datasets

Ultimately, we will end up rendering the data as a DataFusion DataFrame However, there is an intermediate step that allows for some optimization. This generates a DataFrameQueryView. The DataFrameQueryView allows selection of the subset of interest for the dataset (index column, and content columns), filtering to specific time ranges, and managing the sparsity of the data (fill_latest_at). All of these operations occur on the server prior to evaluating future queries so avoid unnecessary computation.

view = (
    dataset
        .dataframe_query_view(index="log_time", contents="/**")
        # Select only a single or subset of recordings
        .filter_partition_id(record_of_interest)
        # Select subset of time range
        .filter_range_nanos(start=start_of_interest, end=end_of_interest)
        # Forward fill for time alignment
        .fill_latest_at()
)

After we have identified what data we want we can get a DataFrame.

df = view.df()

DataFusion provides a pythonic dataframe interface to your data as well as SQL. After performing a series of operations this dataframe can be materialized and returned in common data formats.

pandas_df = df.to_pandas()
polars_df = df.to_polars()
arrow_table = df.to_arrow_table()