Skip to content

Read a plan with --explain

In lesson 0.1 you ran --explain and saw a few lines of output. Let’s actually read it. It’s the single best way to understand a pipeline before it runs — and your first glimpse of an idea that shapes the whole engine: Clinker plans a job before it runs it.

You’ll be able to: read an execution plan, name the parts of the DAG, and spot how two pipelines differ in shape — all without executing anything.

Terminal window
cd examples/pipelines
cargo run -p clinker -- run customer_etl.yaml --explain

--explain takes the YAML all the way through compilation — parsing, validation, typechecking the CXL, building the graph — and then prints the result instead of executing it. Nothing is read from the source; nothing is written. Here’s the heart of the output:

=== Resolved Outputs ===
'results' → ./output/customers.csv
=== Execution Plan ===
Mode: Streaming
Transforms: 2
Output projections: 1
DAG nodes: 4
arbitration: BackPressurePreferred -> Priority
Source DAG:
Tier 0: customers
Transform 'active_only':
Parallelism: Stateless
Transform 'final_flag':
Parallelism: Stateless

Read it top to bottom:

  • Resolved Outputs — where results will land (./output/customers.csv).
  • DAG nodes: 4 — the source, two transforms, and the output, wired into a directed acyclic graph.
  • Mode: Streaming — records can flow straight through without the engine having to collect them all first.
  • Parallelism: Stateless (per transform) — each row is independent, so these transforms have no cross-record state.
  • arbitration: BackPressurePreferred -> Priority — the memory policy the run would use if it came under pressure. You’re seeing the engine’s bounded-memory machinery named, before a byte is allocated.

--explain even prints a Physical Properties section with each node’s predicted peak memory (for this tiny job, predicted_peak=345B — the size of the input CSV).

That a plan exists as its own thing — printable, inspectable, produced before execution — is not an accident. Compilation turns the YAML into a typed, validated artifact, and only that artifact is handed to the runtime:

clinker-plan ·compiled.rs ·CompiledPlan type @47d2e12
pub struct CompiledPlan {
dag: ExecutionPlanDag, // the lowered execution graph
config: PipelineConfig, // the validated configuration
artifacts: CompileArtifacts, // typechecked CXL, bound schemas
// ...
}

We won’t unpack CompiledPlan yet — that’s a Phase 3 lesson. For now, just hold the shape of the idea: plan first, run second. --explain is you stepping in between the two.

A bounded peek at a real run is --dry-run:

Terminal window
cargo run -p clinker -- run customer_etl.yaml --dry-run -n 10

And not every pipeline is a simple stream. Try --explain on one that has to accumulate state before it can emit:

Terminal window
cargo run -p clinker -- run scd_type2.yaml --explain
cargo run -p clinker -- run tumbling_clicks.yaml --explain

Compare their plans to customer_etl’s. The number of DAG nodes, the mode, and the arbitration policy all shift with what the job actually has to do.

That’s Phase 0. You can build the engine, run a pipeline, work the compiler loop, navigate the crates, and read a plan. Next comes Phase 1 — A Record’s Journey, where we follow one record all the way through that four-node DAG and start opening the doors.