Skip to content

Through the DAG

--explain told you customer_etl is a graph of four nodes — DAG nodes: 4. The plan describes the graph; the executor walks it. This lesson is how a record actually travels from one node to the next. Shallow now; Phase 4 is the deep version.

You’ll be able to: describe the path a record takes through the DAG, and explain how the engine picks the right handler for each node.

For customer_etl, the DAG is a straight line — each node feeds the next:

source transform transform output
customers ──▶ active_only ──▶ final_flag ──▶ results
(read CSV) (add is_active) (add tier) (write CSV)

Every node is one variant of a single closed enum, PlanNode — source, transform, output, and a handful of others (aggregate, route, combine…):

clinker-plan ·mod.rs ·PlanNode type @47d2e12
pub enum PlanNode {
Source { /* ... */ },
Transform { /* ... */ },
Route { /* ... */ },
Merge { /* ... */ },
Sort { /* ... */ },
Aggregation { /* ... */ },
Output { /* ... */ },
// ... one variant per kind of node a pipeline can contain
}

The executor walks the nodes and, for each, has to run the right logic — read for a source, evaluate CXL for a transform, write for an output. It does that with a single big match over the PlanNode enum:

clinker-exec ·dispatch.rs ·dispatch_plan_node fn @47d2e12
// the shape of it (real arms call into per-operator modules)
match node {
PlanNode::Source { .. } => /* read records */,
PlanNode::Transform { .. } => /* run the CXL */,
PlanNode::Output { .. } => /* write records */,
// ... one arm per node kind
}

Because PlanNode is a closed set known at compile time, this match must handle every kind — the compiler refuses to build if a node type is left unhandled. There’s no plugin registry and no runtime lookup of “which handler?”; the set of operations is fixed and exhaustive. Why Clinker chose a closed enum over open, pluggable operators — and what that trades away — is one of the central decisions you’ll examine in Phase 4.

Records don’t teleport to the end; each node hands its outputs to the next. The source produces records and passes them downstream; active_only receives each one, runs its CXL, and passes the (now slightly larger) record on; final_flag does the same; the output writes whatever reaches it. One record at a time, in a streaming flow — which is exactly why --explain reported Mode: Streaming.

// quick check

How does the executor choose what to do for each node in the DAG?

You’ve seen the parts: records, the plan, the graph, the dispatch. Time to put one record through all of them at once.