Build & run Clinker

The fastest way to start understanding a system is to run it once, end to end, and watch it do something real. Before we open a single source file, let’s get Clinker compiling and move some actual data through it.

You’ll be able to: build the engine, run a real pipeline, and read its result — and you’ll have met the shape of every Clinker job before we explain any of it.

What Clinker is (the one-paragraph version)

Clinker is a bounded-memory, single-process batch executor for finite ETL jobs. You describe a job as a pipeline in YAML — a set of nodes (a source, some transforms, an output) wired into a graph. Per-record logic is written in a small expression language called CXL. You hand the pipeline to the clinker command; it reads records from the source, pushes them through the graph, writes the output, and exits. Finite and batch: the sources end, the job drains, the process stops.

That’s the whole mental model for now. We’ll earn every word of it over the coming phases.

The example we’ll run

Clinker ships runnable example pipelines. We’ll use the canonical one:

clinker ·customer_etl.yaml example @47d2e12

It’s a customer ETL job — a CSV of customers in, a CSV of flagged customers out, with two transforms in between:

nodes:
  - type: source        # read customers.csv
    name: customers
    config: { type: csv, path: ./data/customers.csv, ... }

  - type: transform     # add an is_active flag
    name: active_only
    input: customers
    config:
      cxl: |
        emit is_active = status == "active"

  - type: transform     # classify into a gold/standard tier
    name: final_flag
    input: active_only
    config:
      cxl: |
        emit tier = if lifetime_value.to_int() > $vars.gold_threshold then "gold" else "standard"

  - type: output        # write the result
    name: results
    input: final_flag
    config: { type: csv, path: ./output/customers.csv }

Four nodes — source → transform → transform → output — a tiny pipeline that is nonetheless a complete Clinker job. The input is small and human-readable:

customer_id,first_name,last_name,email,status,lifetime_value,zip_code
1001,Alice,Chen,alice.chen@acme.com,active,15200,94103
1002,Bob,Martinez,bob.m@globex.com,active,8400,10001
1003,Carol,Johnson,carol.j@example.com,inactive,3200,60601

Alice is active with a lifetime value above the gold_threshold (default 10000), so she’ll be flagged gold; Bob is active but below it, so standard; Carol is inactive.

Build it

Clinker pins its toolchain (a rust-toolchain.toml selects the exact Rust version), so rustup installs the right compiler automatically the first time. From your clinker checkout:

cargo build -p clinker

The first build compiles the whole workspace and takes a few minutes; after that, builds are incremental and fast. (Phase 0.2 is all about that fast inner loop.)

Run it

Two ways to run a pipeline — start with the one that doesn’t touch any data.

1. See the plan, without executing — --explain. Run the example from the examples/pipelines/ directory (so the pipeline’s ./data/... paths resolve):

cd examples/pipelines
cargo run -p clinker -- run customer_etl.yaml --explain

Clinker compiles the pipeline into an execution plan and prints it — but runs nothing:

=== Execution Plan ===

Mode: Streaming
Transforms: 2
Output projections: 1
DAG nodes: 4
arbitration: BackPressurePreferred -> Priority

Source DAG:
  Tier 0: customers

Four DAG nodes, two transforms, “Streaming” mode. You’re looking at the plan — the proof that the job is well-formed — before any record moves. We’ll come back to this view in lesson 0.4, and to why planning is separate from running much later.

2. Actually move data — --dry-run. A dry run processes records and writes the result to your terminal instead of to the output file:

cargo run -p clinker -- run customer_etl.yaml --dry-run -n 5

INFO clinker: Pipeline complete: 5 total, 5 ok, 5 written, 0 dlq

Five records in, five processed, five written, zero rejected — and the process exits 0. That summary line (total / ok / written / dlq) is Clinker telling you the finite job ran clean. (dlq is the dead-letter queue — rejected records; we’ll meet it soon.)

Checkpoint

✓ Checkpoint

// quick check

What does it mean that Clinker is a *finite batch* executor?

You should be able to:

Clinker built (`cargo build -p clinker` succeeds)
`--explain` prints an execution plan with 4 DAG nodes
`--dry-run -n 5` reports `5 ok, 5 written, 0 dlq` and exits 0

Verify in the checkout:

cd examples/pipelines
cargo run -p clinker -- run customer_etl.yaml --explain
cargo run -p clinker -- run customer_etl.yaml --dry-run -n 5

You just ran a four-node DAG end to end. Each of those nodes — the source, the two CXL transforms, the output — is a door we’ll open in later lessons. Next: the fast edit-and-check loop you’ll live in while working on the engine.