Structs & provenance
An enum says “one of these shapes.” A struct says the opposite: “all of these
fields, together.” It’s how you bundle related data into one named thing. The engine uses
structs everywhere; one of the most useful is the little record of where a row came
from — its provenance.
You’ll be able to: define a struct with methods, and explain why every record
carries provenance and how that origin is shared cheaply across a whole file.
struct: AND, where enum is OR
Section titled “struct: AND, where enum is OR”struct Point { x: i64, y: i64 } // a Point has an x AND a yA struct groups fields that belong together; an impl block hangs methods off it.
Where an enum value is one of its variants, a struct value is all of its fields
at once. Records, schemas, plans — the engine’s larger types are structs.
Provenance: every row remembers its origin
Section titled “Provenance: every row remembers its origin”When a record fails, “row 42 of customers.csv” is infinitely more useful than “a record
failed.” So every record carries a small struct describing where it came from:
clinker-record ·provenance.rs ·RecordProvenance type @47d2e12
pub struct RecordProvenance { pub source_file: Arc<str>, // the file this row came from (shared) pub source_row: u64, // its position in that file pub source_batch: Arc<str>, pub ingestion_timestamp: NaiveDateTime,}This is what lets a dead-letter entry point at the exact source row, and what makes error messages specific. It’s a struct precisely because origin is several facts bundled together — a file, a position — that always travel as a unit.
Sharing the origin cheaply
Section titled “Sharing the origin cheaply”Here’s the engineering subtlety. A million rows from customers.csv all share the same
filename. Storing the string "customers.csv" a million times would be wasteful, so the
filename is held behind an Arc<str> — a shared, reference-counted handle — and every
record of that file points at the same one. Cloning the provenance for a new row just
bumps a counter; it doesn’t copy the string. (That Arc sharing pattern is the whole of
the next-but-one lesson, 2.7.)
Build a struct with a method
Section titled “Build a struct with a method”> output appears here — press Run
// quick check
Why is a record's filename stored behind an Arc<str> rather than a plain String per record?
Every row of a file has the same filename. An Arc lets them all share a single allocation; cloning provenance for a new row just bumps the reference count.
Checkpoint
Section titled “Checkpoint”You can bundle data into structs and you’ve met the Arc sharing trick. Next: the
collections that make up a record, and the iterators that stream them one at a time.