Traits: the IO seam
Welcome to Phase 3 — Planning & Expressions. Phase 2 took the data layer apart cell by cell. Now we move up a layer to the machinery that turns a YAML file into the validated plan you’ve been running — and the seams where new behaviour plugs in. This first lesson is about the most important seam in the engine: how it reads and writes file formats it was never specifically written for.
You’ll be able to: read the FormatReader trait, explain how every format reaches the
rest of the engine through it, and say why the engine stores readers as trait objects
(Box<dyn FormatReader>) rather than concrete types.
The motivating problem: one engine, eight formats
Section titled “The motivating problem: one engine, eight formats”Clinker ships readers and writers for CSV, JSON, XML, fixed-width, EDIFACT, X12, HL7, and SWIFT. That’s eight wildly different ways to lay bytes on disk. Yet the executor — the part that pushes records through the DAG — contains zero format-specific code. It never asks “is this a CSV?” How can the same execution loop drive all eight?
The answer is a trait: a contract that says what a reader can do without fixing which reader it is. Every format implements the same two methods, and the engine talks only to the contract.
clinker-format ·traits.rs ·FormatReader trait @47d2e12
/// Streaming record reader. Yields records one at a time.pub trait FormatReader: Send { fn schema(&mut self) -> Result<Arc<Schema>, FormatError>; fn next_record(&mut self) -> Result<Option<Record>, FormatError>; // ... plus default-bodied methods for multi-file / envelope handling}Two required methods carry the whole seam: ask for the schema, then pull records until
next_record returns Ok(None). A FormatReader is exactly “a thing the engine can ask
for a schema and then drain, one Record at a time.” The write side is its mirror image —
consume records one at a time and flush:
clinker-format ·traits.rs ·FormatWriter trait @47d2e12
/// Streaming record writer. Consumes records one at a time.pub trait FormatWriter: Send { fn write_record(&mut self, record: &Record) -> Result<(), FormatError>; fn flush(&mut self) -> Result<(), FormatError>; // ... plus default-bodied document-framing methods}Notice the supertrait bound : Send. The doc comment is explicit about why: a reader is
moved onto the executor’s per-source ingest thread, so it must be Send. It is
deliberately not Sync — a single reader is driven by one thread, streaming, never
shared. That bound is a small architectural decision encoded in the type.
A minimal model: the trait is the seam
Section titled “A minimal model: the trait is the seam”Strip the engine away and the pattern is small. A trait with one method, two implementors, and a function that works on any implementor:
> output appears here — press Run
drain is the executor in miniature: it takes &mut dyn FormatReader and never learns
whether it’s draining CSV or JSON. Add a fourth format tomorrow and drain doesn’t change
by a character. That is the seam doing its job.
Why a trait object, and what dyn costs
Section titled “Why a trait object, and what dyn costs”The format a job uses isn’t known when the engine is compiled — it’s chosen at run time,
from the plan. So the executor needs a single variable that can hold any reader. That’s a
trait object: Box<dyn FormatReader> — a heap-allocated value plus a hidden pointer to a
vtable (the table of “which next_record does this actual reader use?”). Calling through
it is dynamic dispatch: the concrete method is looked up at run time.
In real clinker, the concrete readers are themselves generic structs — CsvReader<R>,
FixedWidthReader<R>, and so on, generic over the byte source R. They’re built
type-specifically, then immediately boxed into a trait object at one factory function so
that everything downstream is uniform:
clinker-exec ·mod.rs ·RecordSource trait @47d2e12
// crates/clinker-exec/src/executor/ingest.rs — the dispatch boundaryfn build_format_reader(/* … */) -> Box<dyn FormatReader> { match &input.format { InputFormat::Csv(opts) => Box::new(CsvReader::new(/* … */)), InputFormat::Json(opts) => Box::new(JsonReader::new(/* … */)), // ... one arm per format, each boxed into the same trait-object type }}This is worth pausing on, because it’s the shape of every plug-in seam in clinker: a typed
enum (InputFormat) is matched once at the boundary, each arm constructs the right concrete
reader, and all of them collapse into one Box<dyn FormatReader>. There is no string-keyed
“format registry” — dispatch is over the closed enum you met in Phase 2, so an unknown
format is a plan-time error, not a runtime lookup miss.
What does dyn cost? One pointer-indirection per call. Here that’s invisible: a vtable call
is dwarfed by the file IO and parsing that produce each record. Dynamic dispatch is the
right trade exactly where the set of types is open-ended and chosen at run time, and the
per-call cost is noise against the work being dispatched. The next lesson shows the opposite
choice — generics — for the field-read hot path where that same pointer-indirection would
not be noise.
One step further: the transport generalization
Section titled “One step further: the transport generalization”FormatReader assumes bytes — it decodes a stream. But a SQL SELECT cursor yields rows with
no byte body at all. So one crate up, the executor defines a broader contract,
RecordSource, and bridges the file case to it with a single blanket impl:
impl RecordSource for Box<dyn FormatReader> { // a file transport reaches the row-oriented seam by wrapping its byte reader}You don’t need the details yet — Phase 4 returns to it. The point is the layering: a
narrow, byte-oriented trait (FormatReader) nested inside a broader, transport-agnostic one
(RecordSource), each a clean seam. Traits compose into layers the same way types do.
// quick check
Why does the engine store readers as Box<dyn FormatReader> instead of a concrete reader type?
The format comes from the plan at run time. A Box is one type that can hold any implementor; the concrete method is resolved through a vtable. The per-call cost is negligible against file IO.
Read the real seam
Section titled “Read the real seam”You’ve seen the engine’s open seam: a trait, implemented per format, boxed into a trait object at one boundary so the rest of the engine stays format-blind. Next: the other dispatch strategy — generics — and why the field-read hot path makes the opposite choice.