Skip to content

Enum vs trait dispatch

Welcome to Phase 4 — Execution & Memory, the deepest pass. Phase 3 ended at the plan/runtime boundary; now we cross it and watch records actually move. This first lesson revisits a question from Phase 3 — “how do you call into one of many kinds?” — and finds the engine answering it the opposite way from the IO seam. Back in lesson 3.1 the format layer used Box<dyn FormatReader>: dynamic dispatch, because formats are an open-ended, runtime-chosen plug-in seam. The DAG executor faces the same shape of problem — many kinds of node — and deliberately chooses a closed enum and one exhaustive match instead.

You’ll be able to: explain when closed-enum dispatch beats trait-object dispatch, read the executor’s central match, and say why the engine has no dyn Operator anywhere.

A compiled plan is a DAG of nodes, and a node is one of a fixed, engine-known set of kinds: a source, a transform, a route, a merge, a sort, an aggregation, an output, and a handful more. That set is a closed enum — the same PlanNode you glimpsed in Phase 3:

clinker-plan ·mod.rs ·PlanNode type @47d2e12
pub enum PlanNode {
Source { /* ... */ },
Transform { /* ... */ },
Route { /* ... */ },
Merge { /* ... */ },
Sort { /* ... */ },
Aggregation { /* ... */ },
Output { /* ... */ },
// ... 13 variants in all — the complete vocabulary of pipeline nodes
}

The key word is closed. The kinds of node a pipeline can contain are decided by the engine, not by users, not at run time. Contrast that with formats: anyone can add a new wire format, and which one a job uses is read from the plan at run time. Formats are open; node kinds are closed. That single difference drives the whole dispatch decision.

Because the set is closed, the executor dispatches with a single match over the enum. Each arm hands the node to its operator module:

clinker-exec ·dispatch.rs ·dispatch_plan_node fn @47d2e12
pub(crate) fn dispatch_plan_node(
ctx: &mut ExecutorContext<'_>,
current_dag: &ExecutionPlanDag,
node_idx: NodeIndex,
) -> Result<(), PipelineError> {
let node = current_dag.graph[node_idx].clone();
match node {
PlanNode::Source { .. } => dispatch_source(ctx, current_dag, node_idx, &node)?,
PlanNode::Transform { .. } => dispatch_transform(ctx, current_dag, node_idx, &node)?,
PlanNode::Route { .. } => dispatch_route(ctx, current_dag, node_idx, &node)?,
PlanNode::Merge { .. } => dispatch_merge(ctx, current_dag, node_idx, &node)?,
// ... one arm per variant — and crucially, NO `_ =>` catch-all
}
Ok(())
}

There is no trait object here and none anywhere in the engine: searching the whole codebase for dyn Operator or trait Operator finds nothing. Operators are not boxed behind a vtable; they’re arms of a match. And there is no _ => wildcard — the match spells out every variant. That second detail is doing real architectural work, as the next section shows.

Lesson 3.1 argued dynamic dispatch was right for the IO seam. Here the trade flips, for three concrete reasons:

  • Exhaustiveness is a feature. Because the match has no catch-all, adding a new PlanNode variant makes every non-exhaustive match a compile error — the compiler hands you the exact list of sites to update (exactly the guarantee from lesson 2.2). With dyn Operator, a forgotten case would be a run-time surprise, not a build failure.
  • The set is closed and known at compile time. A dyn seam exists to let outside code plug in types the engine never named. Node kinds are all named by the engine itself, so the open-endedness dyn buys is worthless here — and you’d pay a vtable indirection per node for it.
  • Operators want specialized data. Each arm can pull the variant’s own payload (a sort’s keys, a transform’s compiled program) by pattern-matching, with no downcasting.

The mental model: dyn is for open sets chosen by others at run time; a closed enum is for sets the engine owns and wants the compiler to police. The IO seam is the former; the operator set is the latter. Same language feature family (traits and enums), opposite tool for opposite jobs.

Here are both strategies side by side. The enum version makes the compiler your checklist; the trait-object version quietly accepts a new kind with no nudge:

rust // editable

Add a Sort variant and the build breaks at dispatch, pointing at the exact code that doesn’t yet handle it. That is the compiler acting as a complete, always-current checklist of “every place a new operator must be wired in.” A Box<dyn Operator> design would have compiled fine and failed later, at run time, when an unhandled node showed up.

// quick check

Why does the executor dispatch DAG nodes through a closed enum + exhaustive match instead of Box<dyn Operator>?

You’ve seen how the executor decides which operator to run. Next: how it runs them at once — the threads that drive each source and the bounded channels that connect them.