Skip to content

Node buffers & spill

Most operators stream — a record comes in, a record goes out. But a blocking operator, like a sort or a large aggregation, has to hold records while it works: you can’t emit the smallest row until you’ve seen them all. If the held set is bigger than memory allows, it must spill to disk. Clinker models that buffer as a three-state enum — in memory, spilled, or mixed — and the enum is the whole state machine. This is one of the cleanest “an enum is a state machine” examples in the engine.

You’ll be able to: read the NodeBuffer state enum, explain its transitions, and describe how records spill to disk and drain back in order.

A node’s held records are a NodeBuffer, and its variants are the three places those records can live:

clinker-exec ·node_buffer.rs ·NodeBuffer type @47d2e12
pub(crate) enum NodeBuffer {
/// All events live in memory — records and punctuations in arrival order.
Memory(Vec<StreamEvent>),
/// Every record lives on disk, as spill files paired with row counts.
/// Punctuations never spill — they wait in the `pending_puncts` sidecar.
Spilled {
chunks: Vec<(SpillFile<u64>, u64)>,
pending_puncts: Vec<Punctuation>,
},
/// A memory tail accumulated after a partial spill.
Mixed {
mem: Vec<StreamEvent>,
spills: Vec<(SpillFile<u64>, u64)>,
pending_puncts: Vec<Punctuation>,
},
}

The enum makes the illegal states unrepresentable: a buffer is in exactly one of these shapes, and every reader matches all three. Memory is the cheap default; Spilled means the records have been pushed to disk so they no longer count against the memory budget; Mixed is what you get when new records arrive after a spill — a fresh in-memory tail riding on top of the on-disk chunks.

Memory Spilled Mixed
┌──────────┐ ┌───────────────┐ ┌─────────────────────────────┐
│ Vec in │ │ spill files │ │ mem tail │ spill files │
│ RAM │ │ on disk │ │ in RAM │ on disk │
└──────────┘ └───────────────┘ └─────────────────────────────┘
│ spill (under memory pressure) ▲ ▲
└─────────────────────────────────┘ │ push after a spill
│ (new mem tail)

Transitions: std::mem::replace moves a state forward

Section titled “Transitions: std::mem::replace moves a state forward”

Pushing a record onto a Spilled buffer promotes it to Mixed, carrying the existing spill chunks across. You can’t just mutate in place — you need to move the old variant’s owned fields into the new variant, and std::mem::replace is the standard Rust move-out-then-replace trick:

crates/clinker-exec/src/executor/node_buffer.rs
pub(crate) fn push_event(&mut self, event: StreamEvent) {
match self {
Self::Memory(v) => v.push(event),
Self::Mixed { mem, .. } => mem.push(event),
Self::Spilled { .. } => {
// move the chunks + puncts out of the old `Spilled`, leaving a temporary
let (chunks, puncts) = match std::mem::replace(self, Self::Memory(Vec::new())) {
Self::Spilled { chunks, pending_puncts } => (chunks, pending_puncts),
_ => unreachable!(),
};
*self = Self::Mixed { mem: vec![event], spills: chunks, pending_puncts: puncts };
}
}
}

The Memory → Spilled transition happens elsewhere — at the admission boundary, when the memory arbitrator (next lesson) says to spill. The point for now is that the type enforces the rules: each transition consumes the old state and produces a valid new one.

When a buffer spills, its rows are written through a SpillWriter:

clinker-exec ·node_buffer_spill.rs ·spill_node_buffer fn @47d2e12
pub(crate) fn spill_node_buffer(
rows: Vec<(Record, u64)>,
spill_dir: Option<&Path>,
compress: bool,
) -> Result<Option<(SpillFile<u64>, u64)>, PipelineError> {
// writes each (record, row_number) pair through a SpillWriter, returns the file + count
}
clinker-exec ·spill.rs ·SpillWriter type @47d2e12

The on-disk format is a leading tag byte (so the reader knows whether the rest is LZ4-compressed), a JSON schema header, then length-prefixed postcard record frames. A SpillFile is backed by a tempfile::TempPath, so it auto-deletes when dropped — the RAII cleanup pattern (Phase 2’s Drop) means a spilled buffer leaves no garbage behind even if the run aborts.

Draining unifies all three states into one iterator, in a fixed order: memory events first, then each spill file streamed back via a SpillReader, then the trailing punctuations. A downstream operator consumes a Mixed buffer exactly as it would a Memory one — the spill is invisible at the drain interface. That uniformity is what lets the rest of the engine ignore whether a buffer fit in RAM.

The three-state shape fits in one screen. Note the std::mem::replace move on the Spilled → Mixed edge — the same trick the real push_event uses:

rust // editable

Run it and watch the buffer walk Memory → Spilled → Mixed. The compiler guarantees you handled every state in push; std::mem::replace lets you move owned data out of the old variant safely. That’s the whole spill state machine, minus the disk IO.

// quick check

What does the Mixed variant of NodeBuffer represent?

A buffer spills when memory gets tight — but who decides it’s tight, across all the operators running at once, and how do they coordinate without stepping on each other? That’s the memory arbitrator, and it’s the engine’s sharpest lesson in interior mutability.