Skip to content

Trace one record end-to-end

You have all the pieces now — Value, Record, Schema, the CompiledPlan, the DAG, the dispatch. This lesson threads them together by following one customer, Alice, from a line in a CSV file to a line in the output, naming every part of the engine she touches.

You’ll be able to: narrate a record’s full journey through customer_etl using the right names for each stage — the skill the whole rest of the curriculum builds on.

clinker ·customer_etl.yaml example @47d2e12

In the source CSV she’s one line:

customer_id,first_name,last_name,email,status,lifetime_value,zip_code
1001,Alice,Chen,alice.chen@acme.com,active,15200,94103

The source node reads the CSV and produces a Record: a Vec<Value> bound to the schema declared in the YAML. Every field is a string at this point:

clinker-record ·mod.rs ·Record type @47d2e12
schema: [customer_id, first_name, last_name, email, status, lifetime_value, zip_code]
values: [ "1001", "Alice", "Chen", ..., "active", "15200", "94103" ]

The first transform runs this CXL over every record:

emit is_active = status == "active"

Alice’s status is "active", so the comparison is true. The transform emits a new field — is_active = Bool(true) — and passes the enlarged record downstream. Her row now carries an eighth value.

The second transform:

emit tier = if lifetime_value.to_int() > $vars.gold_threshold then "gold" else "standard"

Here lifetime_value ("15200") is finally turned into a number — .to_int() — and compared against $vars.gold_threshold (default 10000). 15200 > 10000 is true, so tier = "gold". This is the coercion we promised back in lesson 1.1: the string becomes an integer exactly when a transform needs it to, not before.

The output node writes the record — now with is_active and tier added — to ./output/customers.csv. Alice leaves the engine as:

... ,active,15200,94103,true,gold

--dry-run runs the real thing and writes to your terminal:

Terminal window
cd examples/pipelines
cargo run -p clinker -- run customer_etl.yaml --dry-run -n 5

You saw the summary in Phase 0: 5 total, 5 ok, 5 written, 0 dlq. Alice is one of those five — ok and written. (Carol, who is inactive, still flows through; active_only just flags her is_active = false. Nothing is dropped here — filtering is a later topic.)

💡 Hint 1

Apply the same two CXL rules. Is Bob’s status "active"? Is his lifetime_value (as an integer) greater than the gold_threshold of 10000?

Show solution

Bob’s status is "active", so is_active = true. His lifetime_value is 8400, and 8400 > 10000 is false, so tier = "standard". He leaves as ...,active,8400,10001,true,standard.

// quick check

At which stage does Alice's lifetime_value stop being a string and become a number?

You can now follow a record from source to output and name every stage: a Record of Values bound to a Schema, produced by a source, transformed node-by-node as the executor walks the CompiledPlan’s DAG, and written by an output. That mental map is the spine of everything ahead.

Phase 2 — Data & Representation revisits the first stage in depth: what a Value really costs, how records borrow instead of copy, and the ownership and lifetime rules that make the engine fast. Same journey, deeper each pass.