Read a plan with --explain

In lesson 0.1 you ran --explain and saw a few lines of output. Let’s actually read it. It’s the single best way to understand a pipeline before it runs — and your first glimpse of an idea that shapes the whole engine: Clinker plans a job before it runs it.

You’ll be able to: read an execution plan, name the parts of the DAG, and spot how two pipelines differ in shape — all without executing anything.

`--explain` compiles, then stops

cd examples/pipelines
cargo run -p clinker -- run customer_etl.yaml --explain

--explain takes the YAML all the way through compilation — parsing, validation, typechecking the CXL, building the graph — and then prints the result instead of executing it. Nothing is read from the source; nothing is written. Here’s the heart of the output:

=== Resolved Outputs ===
  'results' → ./output/customers.csv

=== Execution Plan ===
Mode: Streaming
Transforms: 2
Output projections: 1
DAG nodes: 4
arbitration: BackPressurePreferred -> Priority

Source DAG:
  Tier 0: customers

Transform 'active_only':
  Parallelism: Stateless
Transform 'final_flag':
  Parallelism: Stateless

Read it top to bottom:

Resolved Outputs — where results will land (./output/customers.csv).
DAG nodes: 4 — the source, two transforms, and the output, wired into a directed acyclic graph.
Mode: Streaming — records can flow straight through without the engine having to collect them all first.
Parallelism: Stateless (per transform) — each row is independent, so these transforms have no cross-record state.
arbitration: BackPressurePreferred -> Priority — the memory policy the run would use if it came under pressure. You’re seeing the engine’s bounded-memory machinery named, before a byte is allocated.

--explain even prints a Physical Properties section with each node’s predicted peak memory (for this tiny job, predicted_peak=345B — the size of the input CSV).

Why a separate plan at all?

That a plan exists as its own thing — printable, inspectable, produced before execution — is not an accident. Compilation turns the YAML into a typed, validated artifact, and only that artifact is handed to the runtime:

clinker-plan ·compiled.rs ·CompiledPlan type @47d2e12

pub struct CompiledPlan {
    dag: ExecutionPlanDag,        // the lowered execution graph
    config: PipelineConfig,       // the validated configuration
    artifacts: CompileArtifacts,  // typechecked CXL, bound schemas
    // ...
}

We won’t unpack CompiledPlan yet — that’s a Phase 3 lesson. For now, just hold the shape of the idea: plan first, run second. --explain is you stepping in between the two.

See the difference

A bounded peek at a real run is --dry-run:

cargo run -p clinker -- run customer_etl.yaml --dry-run -n 10

And not every pipeline is a simple stream. Try --explain on one that has to accumulate state before it can emit:

cargo run -p clinker -- run scd_type2.yaml --explain
cargo run -p clinker -- run tumbling_clicks.yaml --explain

Compare their plans to customer_etl’s. The number of DAG nodes, the mode, and the arbitration policy all shift with what the job actually has to do.

Checkpoint

✓ Checkpoint

// quick check

What does `--explain` actually do?

--explain runs the full compile — parse, validate, typecheck CXL, build the DAG — then prints the resulting plan instead of executing it. No source is read, no output written.

You should be able to:

You can name the four DAG nodes in customer_etl's plan
You explained, in a sentence, what `--explain` does and does not do
You compared the plan of a streaming pipeline with a stateful one

Verify in the checkout:

cargo run -p clinker -- run customer_etl.yaml --explain
cargo run -p clinker -- run scd_type2.yaml --explain

That’s Phase 0. You can build the engine, run a pipeline, work the compiler loop, navigate the crates, and read a plan. Next comes Phase 1 — A Record’s Journey, where we follow one record all the way through that four-node DAG and start opening the doors.