Node buffers & spill
Most operators stream — a record comes in, a record goes out. But a blocking operator, like a sort or a large aggregation, has to hold records while it works: you can’t emit the smallest row until you’ve seen them all. If the held set is bigger than memory allows, it must spill to disk. Clinker models that buffer as a three-state enum — in memory, spilled, or mixed — and the enum is the whole state machine. This is one of the cleanest “an enum is a state machine” examples in the engine.
You’ll be able to: read the NodeBuffer state enum, explain its transitions, and describe how
records spill to disk and drain back in order.
A buffer with three states
Section titled “A buffer with three states”A node’s held records are a NodeBuffer, and its variants are the three places those records can
live:
clinker-exec ·node_buffer.rs ·NodeBuffer type @47d2e12
pub(crate) enum NodeBuffer { /// All events live in memory — records and punctuations in arrival order. Memory(Vec<StreamEvent>), /// Every record lives on disk, as spill files paired with row counts. /// Punctuations never spill — they wait in the `pending_puncts` sidecar. Spilled { chunks: Vec<(SpillFile<u64>, u64)>, pending_puncts: Vec<Punctuation>, }, /// A memory tail accumulated after a partial spill. Mixed { mem: Vec<StreamEvent>, spills: Vec<(SpillFile<u64>, u64)>, pending_puncts: Vec<Punctuation>, },}The enum makes the illegal states unrepresentable: a buffer is in exactly one of these shapes, and
every reader matches all three. Memory is the cheap default; Spilled means the records have
been pushed to disk so they no longer count against the memory budget; Mixed is what you get when
new records arrive after a spill — a fresh in-memory tail riding on top of the on-disk chunks.
Memory Spilled Mixed ┌──────────┐ ┌───────────────┐ ┌─────────────────────────────┐ │ Vec in │ │ spill files │ │ mem tail │ spill files │ │ RAM │ │ on disk │ │ in RAM │ on disk │ └──────────┘ └───────────────┘ └─────────────────────────────┘ │ spill (under memory pressure) ▲ ▲ └─────────────────────────────────┘ │ push after a spill │ (new mem tail)Transitions: std::mem::replace moves a state forward
Section titled “Transitions: std::mem::replace moves a state forward”Pushing a record onto a Spilled buffer promotes it to Mixed, carrying the existing spill
chunks across. You can’t just mutate in place — you need to move the old variant’s owned fields
into the new variant, and std::mem::replace is the standard Rust move-out-then-replace trick:
pub(crate) fn push_event(&mut self, event: StreamEvent) { match self { Self::Memory(v) => v.push(event), Self::Mixed { mem, .. } => mem.push(event), Self::Spilled { .. } => { // move the chunks + puncts out of the old `Spilled`, leaving a temporary let (chunks, puncts) = match std::mem::replace(self, Self::Memory(Vec::new())) { Self::Spilled { chunks, pending_puncts } => (chunks, pending_puncts), _ => unreachable!(), }; *self = Self::Mixed { mem: vec![event], spills: chunks, pending_puncts: puncts }; } }}The Memory → Spilled transition happens elsewhere — at the admission boundary, when the memory
arbitrator (next lesson) says to spill. The point for now is that the type enforces the rules:
each transition consumes the old state and produces a valid new one.
Spilling to disk, and draining back
Section titled “Spilling to disk, and draining back”When a buffer spills, its rows are written through a SpillWriter:
clinker-exec ·node_buffer_spill.rs ·spill_node_buffer fn @47d2e12
pub(crate) fn spill_node_buffer( rows: Vec<(Record, u64)>, spill_dir: Option<&Path>, compress: bool,) -> Result<Option<(SpillFile<u64>, u64)>, PipelineError> { // writes each (record, row_number) pair through a SpillWriter, returns the file + count} clinker-exec ·spill.rs ·SpillWriter type @47d2e12
The on-disk format is a leading tag byte (so the reader knows whether the rest is LZ4-compressed),
a JSON schema header, then length-prefixed postcard record frames.
A SpillFile is backed by a tempfile::TempPath, so it auto-deletes when dropped — the RAII
cleanup pattern (Phase 2’s Drop) means a spilled buffer leaves no garbage behind even if the run
aborts.
Draining unifies all three states into one iterator, in a fixed order: memory events first, then
each spill file streamed back via a SpillReader, then the trailing punctuations. A downstream
operator consumes a Mixed buffer exactly as it would a Memory one — the spill is invisible at
the drain interface. That uniformity is what lets the rest of the engine ignore whether a buffer
fit in RAM.
Build the state machine
Section titled “Build the state machine”The three-state shape fits in one screen. Note the std::mem::replace move on the
Spilled → Mixed edge — the same trick the real push_event uses:
> output appears here — press Run
Run it and watch the buffer walk Memory → Spilled → Mixed. The compiler guarantees you handled
every state in push; std::mem::replace lets you move owned data out of the old variant safely.
That’s the whole spill state machine, minus the disk IO.
// quick check
What does the Mixed variant of NodeBuffer represent?
Mixed is the post-spill state: earlier rows are in on-disk chunks, and a new in-memory tail accumulates on top. Draining yields the memory tail first, then the spill chunks, then trailing punctuations.
Inspect the buffer
Section titled “Inspect the buffer”A buffer spills when memory gets tight — but who decides it’s tight, across all the operators running at once, and how do they coordinate without stepping on each other? That’s the memory arbitrator, and it’s the engine’s sharpest lesson in interior mutability.