CXL: a staged language
A clinker config can contain expressions: coalesce(email, "n/a"), age > 18, a computed
column. Each runs on every record — millions of times. That puts two demands in tension. A
typo or a nonsense comparison (a_string > a_date) must be caught before the job starts, so
a three-hour run doesn’t die at row nine million. And evaluation must be fast per record.
CXL — Clinker’s expression language — meets both by being a staged language: it separates
“understand the formula” from “run the formula,” and does the understanding exactly once. This
is the whole of Phase 3 in one subsystem, so it’s a fitting place to finish.
You’ll be able to: name CXL’s stages and their boundary types, explain why typechecking happens before any record is processed, and describe what “compile once, evaluate per record” means in terms of closures.
A pipeline of typed stages
Section titled “A pipeline of typed stages”A formula moves through CXL as a sequence of stages, each producing a distinct type:
"price + 10" ──parse──▶ Program ──typecheck──▶ TypedProgram ──compile──▶ CompiledProgram (an AST) (AST + types) (closures)Those aren’t just phases in a function — they’re different Rust types, and the compiler enforces
the order. You cannot hand an un-typechecked Program to the evaluator, because the evaluator’s
input type is TypedProgram, which only the typechecker produces. It’s the same proof-handle
idea as ValidatedPath (lesson 19) and CompiledPlan (lesson 21): each stage’s output type is
evidence that the stage ran.
Stage 1 — parse to an AST
Section titled “Stage 1 — parse to an AST”Parsing turns the formula text into an abstract syntax tree: a tree of Expr nodes. This is
the closed-enum story from Phase 2, now describing a language rather than a value:
cxl ·ast.rs ·Expr type @47d2e12
/// CXL expression — the core of the language. All variants carry a NodeId and Span.pub enum Expr { Binary { op: BinOp, lhs: Box<Expr>, rhs: Box<Expr>, /* … */ }, Unary { op: UnaryOp, operand: Box<Expr>, /* … */ }, Literal { value: LiteralValue, /* … */ }, FieldRef { name: Box<str>, /* … */ }, MethodCall { receiver: Box<Expr>, method: Box<str>, args: Vec<Expr>, /* … */ }, Coalesce { lhs: Box<Expr>, rhs: Box<Expr>, /* … */ }, // ... ~20 variants in all}Two Phase-2 ideas show up at once. It’s a closed enum — a fixed set of expression shapes, so
every later stage can match exhaustively and the compiler guarantees no shape is forgotten
(lesson 10). And the children are Box<Expr> — a tree is recursive, and a recursive type
needs the indirection a Box provides, or its size would be infinite (lesson 09’s boxing,
lesson 15’s smart pointers). price + 10 parses to Binary { op: Add, lhs: FieldRef("price"), rhs: Literal(10) } — a little tree. Every node also carries a Span, so a later error can point
back at the source (lesson 20).
Stage 2 — typecheck, before any record runs
Section titled “Stage 2 — typecheck, before any record runs”Now the formula has structure but no guarantees. Typechecking walks the AST against the record
schema and assigns every node a type, drawn from a closed set that mirrors the nine Value
shapes:
cxl ·types.rs ·Type type @47d2e12
pub enum Type { Null, Bool, Int, Float, String, Date, DateTime, Array, Map, Numeric, // the int/float union Any, // unknown Nullable(Box<Type>),}The typechecker’s signature is the pedagogical crux. On success it yields a TypedProgram; on
failure, a list of diagnostics and nothing runnable:
cxl ·pass.rs ·type_check fn @47d2e12
pub fn type_check( resolved: ResolvedProgram, schema: &Row,) -> Result<TypedProgram, Vec<TypeDiagnostic>> { /* … */ }Read what the Result guarantees. If your formula compares a String to a Date, type_check
returns Err(...). No TypedProgram is ever built. And since the evaluator only accepts a
TypedProgram, evaluation of a type-incorrect formula is literally unreachable — not
“checked again at run time,” but unrepresentable. That’s how the closed Value set you met in
lesson 09 pays off: because the shapes are fixed and known, a formula over them can be fully
typechecked at plan time, and a type error fails the job before record one — exactly the
“validate once at the boundary” discipline of lesson 21, now for expressions.
Stage 3 — compile once to closures, evaluate per record
Section titled “Stage 3 — compile once to closures, evaluate per record”The TypedProgram is correct but still a tree — and walking a tree, re-matching every Expr
variant, on every record, would be slow. So CXL does what lesson 21 did for the whole plan, one
level down: it lowers the typed AST once into a tree of closures, then runs that per record.
The module doc names the strategy outright — “Compile-once-to-closures evaluator.”
cxl ·compiled.rs ·CompiledProgram type @47d2e12
// each lowered node IS a closuretype CompiledExpr<S> = Box<dyn Fn(&mut Frame<S>) -> Result<Value, EvalError> + Send + Sync>;
pub(crate) struct CompiledProgram<S: RecordStorage + 'static> { statements: Vec<CompiledStmt<S>>,}
// lowered exactly once, then reused across every record (and every thread)pub(crate) fn compile<S>(typed: &TypedProgram) -> CompiledProgram<S> { /* lower each stmt once */ }Each AST node becomes a boxed closure that captures its already-compiled children by value.
A literal bakes its Value once at lowering instead of re-reading the AST per record; a field
reference captures its resolved name. Per-record evaluation is then just calling closures — a
sequence of direct Fn calls with no central dispatch match and no re-indexing of the AST.
Because the compiled program is immutable, one copy is shared across records and threads (the
per-record bookkeeping lives in a separate state value the caller threads in). This is the same
generic S: RecordStorage from lesson 18 — the compiled closures are monomorphized over the
storage, so field reads stay zero-cost.
Build a staged evaluator yourself
Section titled “Build a staged evaluator yourself”Parse → an AST, compile-once → closures, evaluate per record. This compiles on std alone and
mirrors CompiledExpr exactly — each node becomes a closure capturing its compiled children:
> output appears here — press Run
compile runs once and returns a closure tree; the per-record loop just calls it. Add a
Sub variant or a second field and you’ll feel the shape of the real evaluator: one lowering
pass, then cheap repeated calls. (CXL also has language-level closures — it => body for
array operations like filter/map — which lower to a host loop over a separately-compiled
body. Same mechanism, exposed to the user.)
// quick check
Why can a type-incorrect CXL formula never be evaluated at run time?
The stages are distinct types. A type error means no TypedProgram is built, and the evaluator's input type is TypedProgram — so the bad formula is caught at plan time and a run over it cannot be expressed, not merely avoided.
Trace the stages
Section titled “Trace the stages”That’s Phase 3 — Planning & Expressions. You’ve seen the engine’s two dispatch strategies
(trait objects for the open IO seam, generics for the hot path), its family of typed proof
handles (ValidatedPath, the strict/spanned parse, CompiledPlan), its failure taxonomy
(PipelineError, abort versus quarantine), and the staged language that runs over every record.
A single thread runs through all of it: push the work and the proof to plan time, and let the
type system carry the guarantee into run time. Phase 4 — Execution & Memory — descends into
that run time: how the DAG executor dispatches nodes, moves records across threads, and stays
inside a memory budget.