Skip to content

Structs & provenance

An enum says “one of these shapes.” A struct says the opposite: “all of these fields, together.” It’s how you bundle related data into one named thing. The engine uses structs everywhere; one of the most useful is the little record of where a row came from — its provenance.

You’ll be able to: define a struct with methods, and explain why every record carries provenance and how that origin is shared cheaply across a whole file.

struct Point { x: i64, y: i64 } // a Point has an x AND a y

A struct groups fields that belong together; an impl block hangs methods off it. Where an enum value is one of its variants, a struct value is all of its fields at once. Records, schemas, plans — the engine’s larger types are structs.

Provenance: every row remembers its origin

Section titled “Provenance: every row remembers its origin”

When a record fails, “row 42 of customers.csv” is infinitely more useful than “a record failed.” So every record carries a small struct describing where it came from:

clinker-record ·provenance.rs ·RecordProvenance type @47d2e12
pub struct RecordProvenance {
pub source_file: Arc<str>, // the file this row came from (shared)
pub source_row: u64, // its position in that file
pub source_batch: Arc<str>,
pub ingestion_timestamp: NaiveDateTime,
}

This is what lets a dead-letter entry point at the exact source row, and what makes error messages specific. It’s a struct precisely because origin is several facts bundled together — a file, a position — that always travel as a unit.

Here’s the engineering subtlety. A million rows from customers.csv all share the same filename. Storing the string "customers.csv" a million times would be wasteful, so the filename is held behind an Arc<str> — a shared, reference-counted handle — and every record of that file points at the same one. Cloning the provenance for a new row just bumps a counter; it doesn’t copy the string. (That Arc sharing pattern is the whole of the next-but-one lesson, 2.7.)

rust // editable

// quick check

Why is a record's filename stored behind an Arc<str> rather than a plain String per record?

You can bundle data into structs and you’ve met the Arc sharing trick. Next: the collections that make up a record, and the iterators that stream them one at a time.