Skip to content

From YAML to a plan

You wrote customer_etl.yaml — a few nodes and some CXL. But YAML is just text. In lesson 0.4, --explain printed an execution plan instead. Something turned your text into that. This lesson follows that transformation, shallowly — the deep version is Phase 3.

You’ll be able to: describe the steps that turn YAML into a runnable plan, and explain why Clinker plans before it runs.

Before a single record moves, your YAML goes through a pipeline of its own:

YAML text
│ parse (span-aware — every value remembers its line & column)
config (typed nodes, still just "what you asked for")
│ validate (no cycles? all inputs exist? paths allowed?)
│ typecheck the CXL (does `lifetime_value.to_int()` make sense?)
│ lower to a graph
CompiledPlan ──► handed to the runtime

The parse step is span-aware: each parsed value carries its source location, so a mistake can be reported with a precise line and column rather than a vague “something’s wrong.” That location-carrying wrapper is Spanned:

clinker-plan ·yaml.rs ·Spanned type @47d2e12
pub struct Spanned<T> {
pub value: T,
pub referenced: Location, // where this value is used in the source
pub defined: Location, // where it was defined (e.g. a YAML anchor)
}

Everything upstream converges on one typed artifact — the CompiledPlan. It bundles the lowered graph, the validated config, and the typechecked CXL:

clinker-plan ·compiled.rs ·CompiledPlan type @47d2e12
pub struct CompiledPlan {
dag: ExecutionPlanDag, // the lowered execution graph
config: PipelineConfig, // the validated configuration
artifacts: CompileArtifacts, // typechecked CXL, bound schemas
// ...
}

When you ran --explain, you saw the engine do all of this — parse, validate, typecheck, lower — and then print the result instead of executing. The execution plan you read was this CompiledPlan, rendered as text.

Two reasons, both of which you can already feel:

  1. Errors surface before any data moves. If a column name is misspelled or a CXL expression is nonsense, you find out at plan time — not halfway through a million-row file with a half-written output.
  2. The runtime gets a proof, not a wish. The executor never receives raw YAML. It only ever accepts a CompiledPlan — an artifact that, by existing, proves the pipeline is well-formed. The boundary between “planning” and “running” is one of Clinker’s defining design decisions; we open it up in Phase 3.

On the Rust side, this whole stage runs on Result: parsing, validation, and typechecking each return success or a typed error rather than throwing. Errors are values you handle, and they roll up into one error vocabulary (PipelineError, a Phase 3 lesson).

// quick check

What does the executor actually receive to run?

You have a plan. Next: how a record actually travels through the graph that plan describes.