Micro Batching
The Rerun SDK automatically handles micro-batching in a background thread in order to find a sweet spot between latency and throughput, reducing metadata overhead and thus improving both bandwidth and CPU usage.
The flushing is triggered by both time and space thresholds, whichever happens to trigger first.
This is very similar to, and has many parallels with, the compaction mechanism running on the datastore side.
You can configure these thresholds using the following environment variables:
RERUN_FLUSH_TICK_SECS
Sets the duration of the periodic tick that triggers the time threshold, in seconds.
Defaults to RERUN_FLUSH_TICK_SECS=0.2
(200ms) unless the recording stream uses a
a networking sink which defaults to RERUN_FLUSH_TICK_SECS=0.008
(8ms).
RERUN_FLUSH_NUM_BYTES
Sets the size limit that triggers the space threshold, in bytes.
Defaults to RERUN_FLUSH_NUM_BYTES=1048576
(1MiB).
RERUN_FLUSH_NUM_ROWS
Sets the number of rows that drives the space threshold.
Defaults to RERUN_FLUSH_NUM_BYTES=18446744073709551615
(u64::MAX
).
Or directly from code, in Python & Rust:
"""
Shows how to configure micro-batching directly from code.
Check out <https://rerun.io/docs/reference/sdk/micro-batching> for more information.
"""
import rerun as rr
# Equivalent to configuring the following environment:
# * RERUN_FLUSH_NUM_BYTES=<+inf>
# * RERUN_FLUSH_NUM_ROWS=10
config = rr.ChunkBatcherConfig(
flush_num_bytes=2**63,
flush_num_rows=10,
)
rec = rr.RecordingStream("rerun_example_micro_batching", batcher_config=config)
rec.spawn()
# These 10 log calls are guaranteed be batched together, and end up in the same chunk.
for i in range(10):
rec.log("logs", rr.TextLog(f"log #{i}"))