Build & run Clinker
The fastest way to start understanding a system is to run it once, end to end, and watch it do something real. Before we open a single source file, let’s get Clinker compiling and move some actual data through it.
You’ll be able to: build the engine, run a real pipeline, and read its result — and you’ll have met the shape of every Clinker job before we explain any of it.
What Clinker is (the one-paragraph version)
Section titled “What Clinker is (the one-paragraph version)”Clinker is a bounded-memory, single-process batch executor for finite ETL jobs.
You describe a job as a pipeline in YAML — a set of nodes (a source, some
transforms, an output) wired into a graph. Per-record logic is written in a small
expression language called CXL. You hand the pipeline to the clinker command;
it reads records from the source, pushes them through the graph, writes the output,
and exits. Finite and batch: the sources end, the job drains, the process stops.
That’s the whole mental model for now. We’ll earn every word of it over the coming phases.
The example we’ll run
Section titled “The example we’ll run”Clinker ships runnable example pipelines. We’ll use the canonical one:
clinker ·customer_etl.yaml example @47d2e12
It’s a customer ETL job — a CSV of customers in, a CSV of flagged customers out, with two transforms in between:
nodes: - type: source # read customers.csv name: customers config: { type: csv, path: ./data/customers.csv, ... }
- type: transform # add an is_active flag name: active_only input: customers config: cxl: | emit is_active = status == "active"
- type: transform # classify into a gold/standard tier name: final_flag input: active_only config: cxl: | emit tier = if lifetime_value.to_int() > $vars.gold_threshold then "gold" else "standard"
- type: output # write the result name: results input: final_flag config: { type: csv, path: ./output/customers.csv }Four nodes — source → transform → transform → output — a tiny pipeline that is nonetheless a complete Clinker job. The input is small and human-readable:
customer_id,first_name,last_name,email,status,lifetime_value,zip_code1001,Alice,Chen,alice.chen@acme.com,active,15200,941031002,Bob,Martinez,bob.m@globex.com,active,8400,100011003,Carol,Johnson,carol.j@example.com,inactive,3200,60601Alice is active with a lifetime value above the gold_threshold (default 10000),
so she’ll be flagged gold; Bob is active but below it, so standard; Carol is
inactive.
Build it
Section titled “Build it”Clinker pins its toolchain (a rust-toolchain.toml selects the exact Rust version),
so rustup installs the right compiler automatically the first time. From your
clinker checkout:
cargo build -p clinkerThe first build compiles the whole workspace and takes a few minutes; after that, builds are incremental and fast. (Phase 0.2 is all about that fast inner loop.)
Run it
Section titled “Run it”Two ways to run a pipeline — start with the one that doesn’t touch any data.
1. See the plan, without executing — --explain. Run the example from the
examples/pipelines/ directory (so the pipeline’s ./data/... paths resolve):
cd examples/pipelinescargo run -p clinker -- run customer_etl.yaml --explainClinker compiles the pipeline into an execution plan and prints it — but runs nothing:
=== Execution Plan ===
Mode: StreamingTransforms: 2Output projections: 1DAG nodes: 4arbitration: BackPressurePreferred -> Priority
Source DAG: Tier 0: customersFour DAG nodes, two transforms, “Streaming” mode. You’re looking at the plan — the proof that the job is well-formed — before any record moves. We’ll come back to this view in lesson 0.4, and to why planning is separate from running much later.
2. Actually move data — --dry-run. A dry run processes records and writes the
result to your terminal instead of to the output file:
cargo run -p clinker -- run customer_etl.yaml --dry-run -n 5INFO clinker: Pipeline complete: 5 total, 5 ok, 5 written, 0 dlqFive records in, five processed, five written, zero rejected — and the process
exits 0. That summary line (total / ok / written / dlq) is Clinker telling you
the finite job ran clean. (dlq is the dead-letter queue — rejected records;
we’ll meet it soon.)
Checkpoint
Section titled “Checkpoint”You just ran a four-node DAG end to end. Each of those nodes — the source, the two CXL transforms, the output — is a door we’ll open in later lessons. Next: the fast edit-and-check loop you’ll live in while working on the engine.