Skip to content

CXL: a staged language

A clinker config can contain expressions: coalesce(email, "n/a"), age > 18, a computed column. Each runs on every record — millions of times. That puts two demands in tension. A typo or a nonsense comparison (a_string > a_date) must be caught before the job starts, so a three-hour run doesn’t die at row nine million. And evaluation must be fast per record. CXL — Clinker’s expression language — meets both by being a staged language: it separates “understand the formula” from “run the formula,” and does the understanding exactly once. This is the whole of Phase 3 in one subsystem, so it’s a fitting place to finish.

You’ll be able to: name CXL’s stages and their boundary types, explain why typechecking happens before any record is processed, and describe what “compile once, evaluate per record” means in terms of closures.

A formula moves through CXL as a sequence of stages, each producing a distinct type:

"price + 10" ──parse──▶ Program ──typecheck──▶ TypedProgram ──compile──▶ CompiledProgram
(an AST) (AST + types) (closures)

Those aren’t just phases in a function — they’re different Rust types, and the compiler enforces the order. You cannot hand an un-typechecked Program to the evaluator, because the evaluator’s input type is TypedProgram, which only the typechecker produces. It’s the same proof-handle idea as ValidatedPath (lesson 19) and CompiledPlan (lesson 21): each stage’s output type is evidence that the stage ran.

Parsing turns the formula text into an abstract syntax tree: a tree of Expr nodes. This is the closed-enum story from Phase 2, now describing a language rather than a value:

cxl ·ast.rs ·Expr type @47d2e12
/// CXL expression — the core of the language. All variants carry a NodeId and Span.
pub enum Expr {
Binary { op: BinOp, lhs: Box<Expr>, rhs: Box<Expr>, /* … */ },
Unary { op: UnaryOp, operand: Box<Expr>, /* … */ },
Literal { value: LiteralValue, /* … */ },
FieldRef { name: Box<str>, /* … */ },
MethodCall { receiver: Box<Expr>, method: Box<str>, args: Vec<Expr>, /* … */ },
Coalesce { lhs: Box<Expr>, rhs: Box<Expr>, /* … */ },
// ... ~20 variants in all
}

Two Phase-2 ideas show up at once. It’s a closed enum — a fixed set of expression shapes, so every later stage can match exhaustively and the compiler guarantees no shape is forgotten (lesson 10). And the children are Box<Expr> — a tree is recursive, and a recursive type needs the indirection a Box provides, or its size would be infinite (lesson 09’s boxing, lesson 15’s smart pointers). price + 10 parses to Binary { op: Add, lhs: FieldRef("price"), rhs: Literal(10) } — a little tree. Every node also carries a Span, so a later error can point back at the source (lesson 20).

Stage 2 — typecheck, before any record runs

Section titled “Stage 2 — typecheck, before any record runs”

Now the formula has structure but no guarantees. Typechecking walks the AST against the record schema and assigns every node a type, drawn from a closed set that mirrors the nine Value shapes:

cxl ·types.rs ·Type type @47d2e12
pub enum Type {
Null, Bool, Int, Float, String, Date, DateTime, Array, Map,
Numeric, // the int/float union
Any, // unknown
Nullable(Box<Type>),
}

The typechecker’s signature is the pedagogical crux. On success it yields a TypedProgram; on failure, a list of diagnostics and nothing runnable:

cxl ·pass.rs ·type_check fn @47d2e12
pub fn type_check(
resolved: ResolvedProgram,
schema: &Row,
) -> Result<TypedProgram, Vec<TypeDiagnostic>> { /* … */ }

Read what the Result guarantees. If your formula compares a String to a Date, type_check returns Err(...). No TypedProgram is ever built. And since the evaluator only accepts a TypedProgram, evaluation of a type-incorrect formula is literally unreachable — not “checked again at run time,” but unrepresentable. That’s how the closed Value set you met in lesson 09 pays off: because the shapes are fixed and known, a formula over them can be fully typechecked at plan time, and a type error fails the job before record one — exactly the “validate once at the boundary” discipline of lesson 21, now for expressions.

Stage 3 — compile once to closures, evaluate per record

Section titled “Stage 3 — compile once to closures, evaluate per record”

The TypedProgram is correct but still a tree — and walking a tree, re-matching every Expr variant, on every record, would be slow. So CXL does what lesson 21 did for the whole plan, one level down: it lowers the typed AST once into a tree of closures, then runs that per record. The module doc names the strategy outright — “Compile-once-to-closures evaluator.”

cxl ·compiled.rs ·CompiledProgram type @47d2e12
// each lowered node IS a closure
type CompiledExpr<S> =
Box<dyn Fn(&mut Frame<S>) -> Result<Value, EvalError> + Send + Sync>;
pub(crate) struct CompiledProgram<S: RecordStorage + 'static> {
statements: Vec<CompiledStmt<S>>,
}
// lowered exactly once, then reused across every record (and every thread)
pub(crate) fn compile<S>(typed: &TypedProgram) -> CompiledProgram<S> { /* lower each stmt once */ }

Each AST node becomes a boxed closure that captures its already-compiled children by value. A literal bakes its Value once at lowering instead of re-reading the AST per record; a field reference captures its resolved name. Per-record evaluation is then just calling closures — a sequence of direct Fn calls with no central dispatch match and no re-indexing of the AST. Because the compiled program is immutable, one copy is shared across records and threads (the per-record bookkeeping lives in a separate state value the caller threads in). This is the same generic S: RecordStorage from lesson 18 — the compiled closures are monomorphized over the storage, so field reads stay zero-cost.

Parse → an AST, compile-once → closures, evaluate per record. This compiles on std alone and mirrors CompiledExpr exactly — each node becomes a closure capturing its compiled children:

rust // editable

compile runs once and returns a closure tree; the per-record loop just calls it. Add a Sub variant or a second field and you’ll feel the shape of the real evaluator: one lowering pass, then cheap repeated calls. (CXL also has language-level closures — it => body for array operations like filter/map — which lower to a host loop over a separately-compiled body. Same mechanism, exposed to the user.)

// quick check

Why can a type-incorrect CXL formula never be evaluated at run time?

That’s Phase 3 — Planning & Expressions. You’ve seen the engine’s two dispatch strategies (trait objects for the open IO seam, generics for the hot path), its family of typed proof handles (ValidatedPath, the strict/spanned parse, CompiledPlan), its failure taxonomy (PipelineError, abort versus quarantine), and the staged language that runs over every record. A single thread runs through all of it: push the work and the proof to plan time, and let the type system carry the guarantee into run time. Phase 4 — Execution & Memory — descends into that run time: how the DAG executor dispatches nodes, moves records across threads, and stays inside a memory budget.