Skip to content

Testing strategy

Welcome to Phase 5 — Extending & Contributing. Phases 0–4 taught you to read the engine; this phase turns you toward changing it. And the first thing a contributor needs is not a clever feature — it’s a way to know they didn’t break anything. This lesson is the engine’s answer to “how do I prove a change is correct?”

You’ll be able to: name the two tiers where clinker’s tests live, explain what “testing to the boundary” buys you, read a snapshot test and a golden-baseline regression test, and read the one property test that checks a fast band-join against a brute-force oracle.

Two tiers: unit tests next to the code, integration tests at the seam

Section titled “Two tiers: unit tests next to the code, integration tests at the seam”

Clinker’s tests live in two clearly separated places, and the split is the first thing to internalize:

  • Inline unit tests — a #[cfg(test)] mod tests block at the bottom of a source file, testing that file’s internals. They can reach private functions.
  • Integration tests — files under crates/<crate>/tests/, compiled as separate crates that can only call the crate’s public API.

A small inline unit test, right beside the coercion logic it covers:

clinker-record ·coercion.rs ·test_coerce_string_to_int_valid test @47d2e12
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_coerce_string_to_int_valid() {
// exercises coerce_to_int directly — a private-ish unit of behavior
}
}

The naming is deliberate and worth copying: tests read as module::tests::scenario::behavior, so a failure name tells you what broke before you open the file. The testing-commands doc records the convention and the canonical run commands:

clinker ·50_TESTING_AND_COMMANDS.md doc @47d2e12
Terminal window
# Fast signal after an edit — does it still compile?
cargo check --workspace --locked --offline
# Run ONE test exactly (the period-separated path is the full test name):
cargo test -p clinker-exec --lib --offline \
executor::tests::spill_dir_unavailable_midrun::unarmed_seam... -- --exact
# The whole suite. The ulimit prefix is load-bearing: the default 1024-fd
# limit makes clinker-exec's spill tests fail with "Too many open files".
ulimit -n 4096 && cargo test --workspace --locked --offline

An integration test under tests/ can’t see internals — so it’s forced to drive the engine the way a user does: feed YAML and input bytes to the public executor, then assert on the output bytes and the run report. That’s “testing to the boundary,” and it’s the engine’s most valuable kind of test, because it survives any internal refactor that keeps the public behavior the same.

clinker-exec ·aggregate_integration.rs ·test_e2e_group_by_sum_count test @47d2e12
// End-to-end: CSV in → aggregate(group_by:[dept], sum + count) → CSV out.
let csv = "dept,salary\neng,100\neng,200\nsales,50\n";
let (report, output) = run_single(yaml, csv);
assert_eq!(report.dlq_entries.len(), 0);
assert_eq!(report.counters.ok_count, 2, "two output groups");
assert_eq!(
sorted_body_lines(&output),
vec!["eng,300,2".to_string(), "sales,50,1".to_string()],
);

Nothing here names a private type. If someone rewrites the aggregation operator’s internals tomorrow, this test still passes as long as eng,100 plus eng,200 still sums to eng,300,2. That’s the whole point: the assertion is pinned to the contract, not the implementation.

Snapshot tests: assert on a big blob without hand-writing it

Section titled “Snapshot tests: assert on a big blob without hand-writing it”

Some outputs are too large to hand-author an assert_eq! for — like the full text of an execution plan from --explain. Clinker uses the insta crate: you write the test, run it once, and insta records the output as a committed .snap file. Later runs compare against that file; an intentional change is reviewed and re-accepted.

clinker-exec ·cull_explain_snapshot.rs ·explain_renders_cull_two_output_ports test @47d2e12
#[test]
fn explain_renders_cull_two_output_ports() {
let text = render_explain(yaml);
// a few structural asserts first (these document intent) ...
assert!(text.contains("FORK [cull] 'drop_bad'"));
// ... then snapshot the whole rendered plan under a stable name:
insta::assert_snapshot!("explain_cull_two_output_ports", text);
}

The committed snapshot it locks against starts with an insta header and then the captured value:

---
source: crates/clinker-exec/tests/cull_explain_snapshot.rs
expression: text
---
=== Execution Plan ===
Mode: Streaming
...

When you intentionally change --explain output, the snapshot test fails, you eyeball the diff, and accept it with cargo insta review (or INSTA_UPDATE=always). The discipline: a snapshot diff in a PR is a visible, reviewable record of an output-format change — it can’t sneak through silently.

The strongest refactor net in the codebase is a corpus of golden baselines: real pipelines whose exact output bytes are committed under tests/fixtures/baselines/ (e.g. csv_transform_sink.expected.csv). A driver runs each pipeline and compares the fresh output to the committed file, byte for byte:

clinker-exec ·pre_lift_baselines.rs ·compare_or_write fn @47d2e12
fn compare_or_write(baseline_name: &str, actual: &str) {
let p = baseline_root().join(baseline_name);
if update_mode() || !p.exists() {
// First run (or UPDATE_BASELINES=1): capture the golden.
std::fs::write(&p, actual.as_bytes()).unwrap();
return;
}
let expected = std::fs::read_to_string(&p).unwrap();
assert_eq!(actual, expected, "byte-mismatch against baseline {}", p.display());
}

Read the control flow carefully — it’s the whole regression-seed pattern in ten lines. The first time a fixture runs (or whenever you deliberately set UPDATE_BASELINES=1), the current output is written as the new golden. Every run after that compares. So the seed is captured once, then frozen; any future change that alters a single byte of any baseline pipeline’s output trips a named failure. (Note: the corpus is keyed by fixture name, not by issue number — clinker doesn’t tag regression tests with bug IDs.)

One property test: fast algorithm vs. slow oracle

Section titled “One property test: fast algorithm vs. slow oracle”

Most tests check fixed examples. A property test instead generates hundreds of random inputs and checks an invariant on every one. Clinker uses proptest for exactly one high-value case: its fast band-join (iejoin_numeric) must agree with a dead-simple, obviously-correct nested-loop join on every random input.

clinker-exec ·iejoin.rs ·proptest_iejoin_matches_nested_loop test @47d2e12
proptest! {
#![proptest_config(ProptestConfig::with_cases(256))]
#[test]
fn proptest_iejoin_matches_nested_loop((left, right, op1, op2) in arb_inputs()) {
let actual: HashSet<(usize, usize)> =
iejoin_numeric(&left, &right, op1.to_range(), op2.to_range())
.into_iter().collect();
let expected = nested_loop(&left, &right, op1, op2); // the slow oracle
prop_assert_eq!(actual, expected);
}
}

This is the oracle pattern: you have a fast implementation you’re unsure about and a slow implementation you trust, and you assert they always agree. It’s worth the machinery precisely because the fast path (coarse-filter striding, permutation indexing) is the kind of code that’s easy to get subtly wrong. For straightforward behavior, a handful of example tests is cheaper and clearer — don’t reach for proptest by default. (The repo also has a combine_iejoin_prop.rs scaffold; the live property test is the inline one shown here.)

// quick check

You refactor the internals of the aggregation operator but intend zero change to its output. Which test most directly protects you, and why?

You can now prove a change is correct. Next: make your first real change — add a builtin to CXL, the engine’s expression language.