Testing strategy
Welcome to Phase 5 — Extending & Contributing. Phases 0–4 taught you to read the engine; this phase turns you toward changing it. And the first thing a contributor needs is not a clever feature — it’s a way to know they didn’t break anything. This lesson is the engine’s answer to “how do I prove a change is correct?”
You’ll be able to: name the two tiers where clinker’s tests live, explain what “testing to the boundary” buys you, read a snapshot test and a golden-baseline regression test, and read the one property test that checks a fast band-join against a brute-force oracle.
Two tiers: unit tests next to the code, integration tests at the seam
Section titled “Two tiers: unit tests next to the code, integration tests at the seam”Clinker’s tests live in two clearly separated places, and the split is the first thing to internalize:
- Inline unit tests — a
#[cfg(test)] mod testsblock at the bottom of a source file, testing that file’s internals. They can reach private functions. - Integration tests — files under
crates/<crate>/tests/, compiled as separate crates that can only call the crate’s public API.
A small inline unit test, right beside the coercion logic it covers:
clinker-record ·coercion.rs ·test_coerce_string_to_int_valid test @47d2e12
#[cfg(test)]mod tests { use super::*;
#[test] fn test_coerce_string_to_int_valid() { // exercises coerce_to_int directly — a private-ish unit of behavior }}The naming is deliberate and worth copying: tests read as
module::tests::scenario::behavior, so a failure name tells you what broke before you
open the file. The testing-commands doc records the convention and the canonical run
commands:
clinker ·50_TESTING_AND_COMMANDS.md doc @47d2e12
# Fast signal after an edit — does it still compile?cargo check --workspace --locked --offline
# Run ONE test exactly (the period-separated path is the full test name):cargo test -p clinker-exec --lib --offline \ executor::tests::spill_dir_unavailable_midrun::unarmed_seam... -- --exact
# The whole suite. The ulimit prefix is load-bearing: the default 1024-fd# limit makes clinker-exec's spill tests fail with "Too many open files".ulimit -n 4096 && cargo test --workspace --locked --offlineTesting to the boundary
Section titled “Testing to the boundary”An integration test under tests/ can’t see internals — so it’s forced to drive the
engine the way a user does: feed YAML and input bytes to the public executor, then
assert on the output bytes and the run report. That’s “testing to the boundary,”
and it’s the engine’s most valuable kind of test, because it survives any internal
refactor that keeps the public behavior the same.
clinker-exec ·aggregate_integration.rs ·test_e2e_group_by_sum_count test @47d2e12
// End-to-end: CSV in → aggregate(group_by:[dept], sum + count) → CSV out.let csv = "dept,salary\neng,100\neng,200\nsales,50\n";let (report, output) = run_single(yaml, csv);
assert_eq!(report.dlq_entries.len(), 0);assert_eq!(report.counters.ok_count, 2, "two output groups");assert_eq!( sorted_body_lines(&output), vec!["eng,300,2".to_string(), "sales,50,1".to_string()],);Nothing here names a private type. If someone rewrites the aggregation operator’s
internals tomorrow, this test still passes as long as eng,100 plus eng,200 still
sums to eng,300,2. That’s the whole point: the assertion is pinned to the contract,
not the implementation.
Snapshot tests: assert on a big blob without hand-writing it
Section titled “Snapshot tests: assert on a big blob without hand-writing it”Some outputs are too large to hand-author an assert_eq! for — like the full text of an
execution plan from --explain. Clinker uses the insta crate: you write the test,
run it once, and insta records the output as a committed .snap file. Later runs
compare against that file; an intentional change is reviewed and re-accepted.
clinker-exec ·cull_explain_snapshot.rs ·explain_renders_cull_two_output_ports test @47d2e12
#[test]fn explain_renders_cull_two_output_ports() { let text = render_explain(yaml); // a few structural asserts first (these document intent) ... assert!(text.contains("FORK [cull] 'drop_bad'")); // ... then snapshot the whole rendered plan under a stable name: insta::assert_snapshot!("explain_cull_two_output_ports", text);}The committed snapshot it locks against starts with an insta header and then the
captured value:
---source: crates/clinker-exec/tests/cull_explain_snapshot.rsexpression: text---=== Execution Plan ===
Mode: Streaming...When you intentionally change --explain output, the snapshot test fails, you eyeball
the diff, and accept it with cargo insta review (or INSTA_UPDATE=always). The
discipline: a snapshot diff in a PR is a visible, reviewable record of an
output-format change — it can’t sneak through silently.
Golden-baseline regression seeds
Section titled “Golden-baseline regression seeds”The strongest refactor net in the codebase is a corpus of golden baselines: real
pipelines whose exact output bytes are committed under tests/fixtures/baselines/
(e.g. csv_transform_sink.expected.csv). A driver runs each pipeline and compares the
fresh output to the committed file, byte for byte:
clinker-exec ·pre_lift_baselines.rs ·compare_or_write fn @47d2e12
fn compare_or_write(baseline_name: &str, actual: &str) { let p = baseline_root().join(baseline_name); if update_mode() || !p.exists() { // First run (or UPDATE_BASELINES=1): capture the golden. std::fs::write(&p, actual.as_bytes()).unwrap(); return; } let expected = std::fs::read_to_string(&p).unwrap(); assert_eq!(actual, expected, "byte-mismatch against baseline {}", p.display());}Read the control flow carefully — it’s the whole regression-seed pattern in ten lines.
The first time a fixture runs (or whenever you deliberately set UPDATE_BASELINES=1),
the current output is written as the new golden. Every run after that compares.
So the seed is captured once, then frozen; any future change that alters a single byte
of any baseline pipeline’s output trips a named failure. (Note: the corpus is keyed by
fixture name, not by issue number — clinker doesn’t tag regression tests with bug IDs.)
One property test: fast algorithm vs. slow oracle
Section titled “One property test: fast algorithm vs. slow oracle”Most tests check fixed examples. A property test instead generates hundreds of random
inputs and checks an invariant on every one. Clinker uses proptest for exactly one
high-value case: its fast band-join (iejoin_numeric) must agree with a dead-simple,
obviously-correct nested-loop join on every random input.
clinker-exec ·iejoin.rs ·proptest_iejoin_matches_nested_loop test @47d2e12
proptest! { #![proptest_config(ProptestConfig::with_cases(256))] #[test] fn proptest_iejoin_matches_nested_loop((left, right, op1, op2) in arb_inputs()) { let actual: HashSet<(usize, usize)> = iejoin_numeric(&left, &right, op1.to_range(), op2.to_range()) .into_iter().collect(); let expected = nested_loop(&left, &right, op1, op2); // the slow oracle prop_assert_eq!(actual, expected); }}This is the oracle pattern: you have a fast implementation you’re unsure about and a
slow implementation you trust, and you assert they always agree. It’s worth the
machinery precisely because the fast path (coarse-filter striding, permutation
indexing) is the kind of code that’s easy to get subtly wrong. For straightforward
behavior, a handful of example tests is cheaper and clearer — don’t reach for proptest
by default. (The repo also has a combine_iejoin_prop.rs scaffold; the live property
test is the inline one shown here.)
Match the test to what you’re changing
Section titled “Match the test to what you’re changing”// quick check
You refactor the internals of the aggregation operator but intend zero change to its output. Which test most directly protects you, and why?
Testing to the boundary means asserting on the public output, not internals. An internal refactor that preserves behavior leaves the boundary test green and instantly flags it red if behavior drifts. A unit test on a renamed private helper may not even compile after the refactor; the proptest only applies to the band-join.
Investigate the suite
Section titled “Investigate the suite”You can now prove a change is correct. Next: make your first real change — add a builtin to CXL, the engine’s expression language.