CXL: a staged language

A clinker config can contain expressions: coalesce(email, "n/a"), age > 18, a computed column. Each runs on every record — millions of times. That puts two demands in tension. A typo or a nonsense comparison (a_string > a_date) must be caught before the job starts, so a three-hour run doesn’t die at row nine million. And evaluation must be fast per record. CXL — Clinker’s expression language — meets both by being a staged language: it separates “understand the formula” from “run the formula,” and does the understanding exactly once. This is the whole of Phase 3 in one subsystem, so it’s a fitting place to finish.

You’ll be able to: name CXL’s stages and their boundary types, explain why typechecking happens before any record is processed, and describe what “compile once, evaluate per record” means in terms of closures.

A pipeline of typed stages

A formula moves through CXL as a sequence of stages, each producing a distinct type:

"price + 10"  ──parse──▶  Program  ──typecheck──▶  TypedProgram  ──compile──▶  CompiledProgram
                          (an AST)                  (AST + types)              (closures)

Those aren’t just phases in a function — they’re different Rust types, and the compiler enforces the order. You cannot hand an un-typechecked Program to the evaluator, because the evaluator’s input type is TypedProgram, which only the typechecker produces. It’s the same proof-handle idea as ValidatedPath (lesson 19) and CompiledPlan (lesson 21): each stage’s output type is evidence that the stage ran.

Stage 1 — parse to an AST

Parsing turns the formula text into an abstract syntax tree: a tree of Expr nodes. This is the closed-enum story from Phase 2, now describing a language rather than a value:

cxl ·ast.rs ·Expr type @47d2e12

/// CXL expression — the core of the language. All variants carry a NodeId and Span.
pub enum Expr {
    Binary { op: BinOp, lhs: Box<Expr>, rhs: Box<Expr>, /* … */ },
    Unary { op: UnaryOp, operand: Box<Expr>, /* … */ },
    Literal { value: LiteralValue, /* … */ },
    FieldRef { name: Box<str>, /* … */ },
    MethodCall { receiver: Box<Expr>, method: Box<str>, args: Vec<Expr>, /* … */ },
    Coalesce { lhs: Box<Expr>, rhs: Box<Expr>, /* … */ },
    // ... ~20 variants in all
}

Two Phase-2 ideas show up at once. It’s a closed enum — a fixed set of expression shapes, so every later stage can match exhaustively and the compiler guarantees no shape is forgotten (lesson 10). And the children are Box<Expr> — a tree is recursive, and a recursive type needs the indirection a Box provides, or its size would be infinite (lesson 09’s boxing, lesson 15’s smart pointers). price + 10 parses to Binary { op: Add, lhs: FieldRef("price"), rhs: Literal(10) } — a little tree. Every node also carries a Span, so a later error can point back at the source (lesson 20).

Stage 2 — typecheck, before any record runs

Now the formula has structure but no guarantees. Typechecking walks the AST against the record schema and assigns every node a type, drawn from a closed set that mirrors the nine Value shapes:

cxl ·types.rs ·Type type @47d2e12

pub enum Type {
    Null, Bool, Int, Float, String, Date, DateTime, Array, Map,
    Numeric,            // the int/float union
    Any,                // unknown
    Nullable(Box<Type>),
}

The typechecker’s signature is the pedagogical crux. On success it yields a TypedProgram; on failure, a list of diagnostics and nothing runnable:

cxl ·pass.rs ·type_check fn @47d2e12

pub fn type_check(
    resolved: ResolvedProgram,
    schema: &Row,
) -> Result<TypedProgram, Vec<TypeDiagnostic>> { /* … */ }

Read what the Result guarantees. If your formula compares a String to a Date, type_check returns Err(...). No TypedProgram is ever built. And since the evaluator only accepts a TypedProgram, evaluation of a type-incorrect formula is literally unreachable — not “checked again at run time,” but unrepresentable. That’s how the closed Value set you met in lesson 09 pays off: because the shapes are fixed and known, a formula over them can be fully typechecked at plan time, and a type error fails the job before record one — exactly the “validate once at the boundary” discipline of lesson 21, now for expressions.

Stage 3 — compile once to closures, evaluate per record

The TypedProgram is correct but still a tree — and walking a tree, re-matching every Expr variant, on every record, would be slow. So CXL does what lesson 21 did for the whole plan, one level down: it lowers the typed AST once into a tree of closures, then runs that per record. The module doc names the strategy outright — “Compile-once-to-closures evaluator.”

cxl ·compiled.rs ·CompiledProgram type @47d2e12

// each lowered node IS a closure
type CompiledExpr<S> =
    Box<dyn Fn(&mut Frame<S>) -> Result<Value, EvalError> + Send + Sync>;

pub(crate) struct CompiledProgram<S: RecordStorage + 'static> {
    statements: Vec<CompiledStmt<S>>,
}

// lowered exactly once, then reused across every record (and every thread)
pub(crate) fn compile<S>(typed: &TypedProgram) -> CompiledProgram<S> { /* lower each stmt once */ }

Each AST node becomes a boxed closure that captures its already-compiled children by value. A literal bakes its Value once at lowering instead of re-reading the AST per record; a field reference captures its resolved name. Per-record evaluation is then just calling closures — a sequence of direct Fn calls with no central dispatch match and no re-indexing of the AST. Because the compiled program is immutable, one copy is shared across records and threads (the per-record bookkeeping lives in a separate state value the caller threads in). This is the same generic S: RecordStorage from lesson 18 — the compiled closures are monomorphized over the storage, so field reads stay zero-cost.

Build a staged evaluator yourself

Parse → an AST, compile-once → closures, evaluate per record. This compiles on std alone and mirrors CompiledExpr exactly — each node becomes a closure capturing its compiled children:

rust // editable

use std::collections::HashMap;

// Stage 1 — the AST. A recursive enum needs Box for its children.
enum Expr {
  Lit(i64),
  Field(String),
  Add(Box<Expr>, Box<Expr>),
}

type Record = HashMap<String, i64>;

// Stage 3 — compile ONCE into a closure, instead of re-walking Expr per record.
type Compiled = Box<dyn Fn(&Record) -> i64>;

fn compile(e: Expr) -> Compiled {
  match e {
      Expr::Lit(n) => Box::new(move |_rec| n),               // bake the literal once
      Expr::Field(name) => Box::new(move |rec| *rec.get(&name).unwrap_or(&0)),
      Expr::Add(l, r) => {
          let lc = compile(*l);      // children compiled once, captured by value
          let rc = compile(*r);
          Box::new(move |rec| lc(rec) + rc(rec))
      }
  }
}

fn main() {
  // formula: price + 10  ->  Add(Field("price"), Lit(10))
  let expr = Expr::Add(
      Box::new(Expr::Field("price".into())),
      Box::new(Expr::Lit(10)),
  );

  // Compile the expression ONE time...
  let program = compile(expr);

  // ...then evaluate it over many records — no AST re-walk per record.
  let records = [
      HashMap::from([("price".to_string(), 100)]),
      HashMap::from([("price".to_string(), 250)]),
      HashMap::from([("price".to_string(), 7)]),
  ];
  for rec in &records {
      println!("{}", program(rec));
  }
}

use std::collections::HashMap;

// Stage 1 — the AST. A recursive enum needs Box for its children.
enum Expr {
  Lit(i64),
  Field(String),
  Add(Box<Expr>, Box<Expr>),
}

type Record = HashMap<String, i64>;

// Stage 3 — compile ONCE into a closure, instead of re-walking Expr per record.
type Compiled = Box<dyn Fn(&Record) -> i64>;

fn compile(e: Expr) -> Compiled {
  match e {
      Expr::Lit(n) => Box::new(move |_rec| n),               // bake the literal once
      Expr::Field(name) => Box::new(move |rec| *rec.get(&name).unwrap_or(&0)),
      Expr::Add(l, r) => {
          let lc = compile(*l);      // children compiled once, captured by value
          let rc = compile(*r);
          Box::new(move |rec| lc(rec) + rc(rec))
      }
  }
}

fn main() {
  // formula: price + 10  ->  Add(Field("price"), Lit(10))
  let expr = Expr::Add(
      Box::new(Expr::Field("price".into())),
      Box::new(Expr::Lit(10)),
  );

  // Compile the expression ONE time...
  let program = compile(expr);

  // ...then evaluate it over many records — no AST re-walk per record.
  let records = [
      HashMap::from([("price".to_string(), 100)]),
      HashMap::from([("price".to_string(), 250)]),
      HashMap::from([("price".to_string(), 7)]),
  ];
  for rec in &records {
      println!("{}", program(rec));
  }
}

> output appears here — press Run

compile runs once and returns a closure tree; the per-record loop just calls it. Add a Sub variant or a second field and you’ll feel the shape of the real evaluator: one lowering pass, then cheap repeated calls. (CXL also has language-level closures — it => body for array operations like filter/map — which lower to a host loop over a separately-compiled body. Same mechanism, exposed to the user.)

// quick check

Why can a type-incorrect CXL formula never be evaluated at run time?

The stages are distinct types. A type error means no TypedProgram is built, and the evaluator's input type is TypedProgram — so the bad formula is caught at plan time and a run over it cannot be expressed, not merely avoided.

Trace the stages

✓ Checkpoint — CXL staging

💡 Hint 1

Read type_check’s return type — what comes back on Err? Then read the first line of eval/compiled.rs. The two together explain why a bad formula can’t reach the per-record path.

What the stages establish

Expr is the parsed AST (a closed, boxed-recursive enum). Type is the closed type set the typechecker assigns. type_check returns Result<TypedProgram, Vec<TypeDiagnostic>> — an error yields no TypedProgram, so eval is unreachable for a bad formula. eval/compiled.rs lowers a TypedProgram once into a CompiledProgram of closures, run per record. Parse, typecheck, eval — distinct typed stages, with the expensive understanding done once.

// quick check

What does 'compile once to closures' save compared with a tree-walking interpreter?

A tree-walker re-matches the AST per record. Compiling once bakes the structure (and literals) into a closure tree captured by value, so per-record work is just calling closures — the same decide-once-run-many idea as CompiledPlan, one level down.

You should be able to:

You can name CXL's stages and the distinct type each produces
You can explain why typechecking happens before any record and what makes it possible (the closed Value/Type set)
You can explain 'compile once to closures' and how it avoids re-walking the AST per record

Verify in the checkout:

grep -n 'pub enum Expr' crates/cxl/src/ast.rs
grep -n 'pub enum Type' crates/cxl/src/typecheck/types.rs
grep -n 'pub fn type_check' crates/cxl/src/typecheck/pass.rs
grep -n 'Compile-once-to-closures' crates/cxl/src/eval/compiled.rs

That’s Phase 3 — Planning & Expressions. You’ve seen the engine’s two dispatch strategies (trait objects for the open IO seam, generics for the hot path), its family of typed proof handles (ValidatedPath, the strict/spanned parse, CompiledPlan), its failure taxonomy (PipelineError, abort versus quarantine), and the staged language that runs over every record. A single thread runs through all of it: push the work and the proof to plan time, and let the type system carry the guarantee into run time. Phase 4 — Execution & Memory — descends into that run time: how the DAG executor dispatches nodes, moves records across threads, and stays inside a memory budget.