Read a plan with --explain
In lesson 0.1 you ran --explain and saw a few lines of output. Let’s actually read
it. It’s the single best way to understand a pipeline before it runs — and your first
glimpse of an idea that shapes the whole engine: Clinker plans a job before it runs
it.
You’ll be able to: read an execution plan, name the parts of the DAG, and spot how two pipelines differ in shape — all without executing anything.
--explain compiles, then stops
Section titled “--explain compiles, then stops”cd examples/pipelinescargo run -p clinker -- run customer_etl.yaml --explain--explain takes the YAML all the way through compilation — parsing, validation,
typechecking the CXL, building the graph — and then prints the result instead of
executing it. Nothing is read from the source; nothing is written. Here’s the heart
of the output:
=== Resolved Outputs === 'results' → ./output/customers.csv
=== Execution Plan ===Mode: StreamingTransforms: 2Output projections: 1DAG nodes: 4arbitration: BackPressurePreferred -> Priority
Source DAG: Tier 0: customers
Transform 'active_only': Parallelism: StatelessTransform 'final_flag': Parallelism: StatelessRead it top to bottom:
- Resolved Outputs — where results will land (
./output/customers.csv). - DAG nodes: 4 — the source, two transforms, and the output, wired into a directed acyclic graph.
- Mode: Streaming — records can flow straight through without the engine having to collect them all first.
- Parallelism: Stateless (per transform) — each row is independent, so these transforms have no cross-record state.
- arbitration: BackPressurePreferred -> Priority — the memory policy the run would use if it came under pressure. You’re seeing the engine’s bounded-memory machinery named, before a byte is allocated.
--explain even prints a Physical Properties section with each node’s predicted
peak memory (for this tiny job, predicted_peak=345B — the size of the input CSV).
Why a separate plan at all?
Section titled “Why a separate plan at all?”That a plan exists as its own thing — printable, inspectable, produced before execution — is not an accident. Compilation turns the YAML into a typed, validated artifact, and only that artifact is handed to the runtime:
clinker-plan ·compiled.rs ·CompiledPlan type @47d2e12
pub struct CompiledPlan { dag: ExecutionPlanDag, // the lowered execution graph config: PipelineConfig, // the validated configuration artifacts: CompileArtifacts, // typechecked CXL, bound schemas // ...}We won’t unpack CompiledPlan yet — that’s a Phase 3 lesson. For now, just hold the
shape of the idea: plan first, run second. --explain is you stepping in between
the two.
See the difference
Section titled “See the difference”A bounded peek at a real run is --dry-run:
cargo run -p clinker -- run customer_etl.yaml --dry-run -n 10And not every pipeline is a simple stream. Try --explain on one that has to
accumulate state before it can emit:
cargo run -p clinker -- run scd_type2.yaml --explaincargo run -p clinker -- run tumbling_clicks.yaml --explainCompare their plans to customer_etl’s. The number of DAG nodes, the mode, and the
arbitration policy all shift with what the job actually has to do.
Checkpoint
Section titled “Checkpoint”That’s Phase 0. You can build the engine, run a pipeline, work the compiler loop, navigate the crates, and read a plan. Next comes Phase 1 — A Record’s Journey, where we follow one record all the way through that four-node DAG and start opening the doors.