From YAML to a plan

You wrote customer_etl.yaml — a few nodes and some CXL. But YAML is just text. In lesson 0.4, --explain printed an execution plan instead. Something turned your text into that. This lesson follows that transformation, shallowly — the deep version is Phase 3.

You’ll be able to: describe the steps that turn YAML into a runnable plan, and explain why Clinker plans before it runs.

The journey from text to plan

Before a single record moves, your YAML goes through a pipeline of its own:

YAML text
   │  parse (span-aware — every value remembers its line & column)
   ▼
config  (typed nodes, still just "what you asked for")
   │  validate  (no cycles? all inputs exist? paths allowed?)
   ▼
   │  typecheck the CXL  (does `lifetime_value.to_int()` make sense?)
   ▼
   │  lower to a graph
   ▼
CompiledPlan  ──►  handed to the runtime

The parse step is span-aware: each parsed value carries its source location, so a mistake can be reported with a precise line and column rather than a vague “something’s wrong.” That location-carrying wrapper is Spanned:

clinker-plan ·yaml.rs ·Spanned type @47d2e12

pub struct Spanned<T> {
    pub value: T,
    pub referenced: Location,   // where this value is used in the source
    pub defined: Location,      // where it was defined (e.g. a YAML anchor)
}

The output is a `CompiledPlan`

Everything upstream converges on one typed artifact — the CompiledPlan. It bundles the lowered graph, the validated config, and the typechecked CXL:

clinker-plan ·compiled.rs ·CompiledPlan type @47d2e12

pub struct CompiledPlan {
    dag: ExecutionPlanDag,        // the lowered execution graph
    config: PipelineConfig,       // the validated configuration
    artifacts: CompileArtifacts,  // typechecked CXL, bound schemas
    // ...
}

When you ran --explain, you saw the engine do all of this — parse, validate, typecheck, lower — and then print the result instead of executing. The execution plan you read was this CompiledPlan, rendered as text.

Why plan first?

Two reasons, both of which you can already feel:

Errors surface before any data moves. If a column name is misspelled or a CXL expression is nonsense, you find out at plan time — not halfway through a million-row file with a half-written output.
The runtime gets a proof, not a wish. The executor never receives raw YAML. It only ever accepts a CompiledPlan — an artifact that, by existing, proves the pipeline is well-formed. The boundary between “planning” and “running” is one of Clinker’s defining design decisions; we open it up in Phase 3.

On the Rust side, this whole stage runs on Result: parsing, validation, and typechecking each return success or a typed error rather than throwing. Errors are values you handle, and they roll up into one error vocabulary (PipelineError, a Phase 3 lesson).

// quick check

What does the executor actually receive to run?

The executor only ever accepts a CompiledPlan. Parsing, validation, and CXL typechecking all happen first; the executor consumes the proof, not the YAML. That separation is Phase 3's main subject.

Checkpoint

✓ Checkpoint

Re-run --explain and find, in its output, the evidence that validation and lowering already happened (the DAG, the resolved outputs).

You should be able to:

You can list the steps: parse → validate → typecheck CXL → lower → CompiledPlan
You can give one reason planning happens before running
You connected `--explain`'s output to the CompiledPlan

Verify in the checkout:

cd examples/pipelines
cargo run -p clinker -- run customer_etl.yaml --explain

You have a plan. Next: how a record actually travels through the graph that plan describes.