From YAML to a plan
You wrote customer_etl.yaml — a few nodes and some CXL. But YAML is just text. In
lesson 0.4, --explain printed an execution plan instead. Something turned your text
into that. This lesson follows that transformation, shallowly — the deep version is
Phase 3.
You’ll be able to: describe the steps that turn YAML into a runnable plan, and explain why Clinker plans before it runs.
The journey from text to plan
Section titled “The journey from text to plan”Before a single record moves, your YAML goes through a pipeline of its own:
YAML text │ parse (span-aware — every value remembers its line & column) ▼config (typed nodes, still just "what you asked for") │ validate (no cycles? all inputs exist? paths allowed?) ▼ │ typecheck the CXL (does `lifetime_value.to_int()` make sense?) ▼ │ lower to a graph ▼CompiledPlan ──► handed to the runtimeThe parse step is span-aware: each parsed value carries its source location, so a
mistake can be reported with a precise line and column rather than a vague “something’s
wrong.” That location-carrying wrapper is Spanned:
clinker-plan ·yaml.rs ·Spanned type @47d2e12
pub struct Spanned<T> { pub value: T, pub referenced: Location, // where this value is used in the source pub defined: Location, // where it was defined (e.g. a YAML anchor)}The output is a CompiledPlan
Section titled “The output is a CompiledPlan”Everything upstream converges on one typed artifact — the CompiledPlan. It bundles the
lowered graph, the validated config, and the typechecked CXL:
clinker-plan ·compiled.rs ·CompiledPlan type @47d2e12
pub struct CompiledPlan { dag: ExecutionPlanDag, // the lowered execution graph config: PipelineConfig, // the validated configuration artifacts: CompileArtifacts, // typechecked CXL, bound schemas // ...}When you ran --explain, you saw the engine do all of this — parse, validate,
typecheck, lower — and then print the result instead of executing. The execution
plan you read was this CompiledPlan, rendered as text.
Why plan first?
Section titled “Why plan first?”Two reasons, both of which you can already feel:
- Errors surface before any data moves. If a column name is misspelled or a CXL expression is nonsense, you find out at plan time — not halfway through a million-row file with a half-written output.
- The runtime gets a proof, not a wish. The executor never receives raw YAML.
It only ever accepts a
CompiledPlan— an artifact that, by existing, proves the pipeline is well-formed. The boundary between “planning” and “running” is one of Clinker’s defining design decisions; we open it up in Phase 3.
On the Rust side, this whole stage runs on Result: parsing, validation, and
typechecking each return success or a typed error rather than throwing. Errors are
values you handle, and they roll up into one error vocabulary (PipelineError, a
Phase 3 lesson).
// quick check
What does the executor actually receive to run?
The executor only ever accepts a CompiledPlan. Parsing, validation, and CXL typechecking all happen first; the executor consumes the proof, not the YAML. That separation is Phase 3's main subject.
Checkpoint
Section titled “Checkpoint”You have a plan. Next: how a record actually travels through the graph that plan describes.