Skip to content

Benchmark & measure memory

All of Phase 4 has rested on a claim: this is cheaper. A 24-byte FieldStr saves an allocation; the arbitrator keeps memory bounded; spilling trades RAM for disk. None of that is worth anything unless you can measure it. This final lesson is about clinker’s measurement tools — and they come in two flavours that are easy to conflate: a fast estimate the runtime uses to budget memory, and an exact count the benchmarks use to verify it.

You’ll be able to: explain the heap_size cost model and why it’s an estimate, read a criterion benchmark, and explain how a custom GlobalAlloc counts allocated bytes.

Recall lessons 4.4–4.5: operators charge bytes against the memory budget. What is a record’s byte cost? The engine estimates it with heap_size — owned heap bytes, accounted per Value variant:

clinker-record ·value.rs ·heap_size fn @47d2e12
/// Estimated heap bytes owned by this value (excludes the enum itself).
pub fn heap_size(&self) -> usize {
match self {
Value::String(s) => s.heap_size(), // 0 if inline, else byte length
Value::Array(arr) => {
arr.capacity() * std::mem::size_of::<Value>()
+ arr.iter().map(Value::heap_size).sum::<usize>()
}
Value::Map(m) => m.iter().map(|(k, v)| k.len() + v.heap_size()).sum(),
_ => 0, // scalars live inline — no heap
}
}

Two things to read here. Scalars (Integer, Bool, dates) return 0 — they live inline in the 32-byte Value, owning no heap. And a short inline FieldStr also returns 0, which is the 24-byte type from last lesson paying off in the cost model directly. Crucially, this is an estimate, computed by walking the value — not a real allocator measurement. It has to be: charging runs on the hot path, per record, per operator, and you cannot attach an allocator probe to every value without wrecking the throughput you’re trying to bound. A cheap, consistent estimate is the right tool for a runtime budget.

For the other job — proving a change actually made things faster or smaller — clinker uses criterion benchmarks. They live in each crate’s benches/ directory: record_ops (record create/get/set/clone, value_heap_size), arbitration_poll, and more.

clinker-exec ·arbitration_poll.rs ·bench_should_spill bench @47d2e12
// crates/clinker-record/benches/record_ops.rs — the criterion shape
fn bench_value_heap_size(c: &mut Criterion) {
let string = Value::String("a medium length string".into());
c.bench_function("value_heap_size/string", |b| {
b.iter(|| black_box(string.heap_size())); // black_box stops the optimizer
});
}

black_box is the load-bearing detail: it hides its argument from the optimizer, so the compiler can’t “see through” the benchmark and delete the work you’re trying to time. You run these with cargo bench -p clinker-record --bench record_ops (or --bench arbitration_poll for the arbitrator-poll cost). The arbitration_poll suite, for instance, times should_spill across consumer-registry sizes to keep the arbitrator’s per-poll cost from regressing as pipelines deepen.

The exact count: a custom global allocator

Section titled “The exact count: a custom global allocator”

An estimate budgets; a count verifies. To measure the real bytes a section allocates, clinker ships a counting global allocator behind the bench-alloc feature:

clinker-bench-support ·alloc.rs ·AccountingAlloc type @47d2e12
pub struct AccountingAlloc { allocs: AtomicUsize, bytes_alloc: AtomicUsize, /* ... */ }
// SAFETY: every call forwards to the System allocator after counting.
unsafe impl GlobalAlloc for AccountingAlloc {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
self.allocs.fetch_add(1, Ordering::Relaxed);
self.bytes_alloc.fetch_add(layout.size(), Ordering::Relaxed);
unsafe { System.alloc(layout) } // delegate the real work
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
/* count, then */ unsafe { System.dealloc(ptr, layout) }
}
}

Install it with #[global_allocator] and every allocation in the process flows through it, bumping atomic counters (the interior-mutability pattern from 4.4, and the unsafe impl discipline from 4.6). A scoped Region snapshots the counters before and after a block to report exactly how many bytes it allocated — that’s how the executor’s per-stage heap_delta_bytes metric is captured under the feature. The cost is real (it adds contention per allocation), so it’s a measurement build, not the production path.

So the two tools divide the labour cleanly: heap_size estimates, cheaply, for the live budget; AccountingAlloc counts, exactly, for offline verification. Confusing the two — budgeting off real allocation, or benchmarking off the estimate — would get you the worst of each.

The Rust playground lets you install a global allocator, so you can build a miniature AccountingAlloc right here — same unsafe impl GlobalAlloc, same forward-to-System pattern:

rust // editable

The Vec::with_capacity(1024) shows up as a 1024-byte allocation in the delta — an exact count, not an estimate. This is clinker’s AccountingAlloc in miniature: a GlobalAlloc impl that counts and forwards. Swap with_capacity(1024) for a String::from("...") or a vec![0u8; 100] and watch the bytes track precisely.

// quick check

Why does the runtime budget charge memory using heap_size (an estimate) instead of the exact AccountingAlloc count?

That’s Phase 4 — Execution & Memory. You’ve gone from “the plan is validated” to the live machine that runs it: closed-enum dispatch, one thread per source feeding bounded channels, buffers that spill, an arbitrator that bounds memory through interior mutability, the unsafe core of the field string, and the tools that measure all of it. The thread tying Phases 3 and 4 together: push proof to the boundary, then run hard inside it — with the type system, the budget, and the benchmarks each enforcing a different guarantee. Phase 5 turns you from reader to contributor: adding an operator, a format, a CXL builtin, and passing the review gauntlet.