Skip to content

What is a record?

In Phase 0 you ran customer_etl and Clinker reported “5 records.” Phase 1 follows one of those records all the way through the engine. But first: what is a record? It’s the unit of data that flows through every pipeline, and it’s built from three vocabulary types you’ll see everywhere.

You’ll be able to: name the three types that make up a record (Value, Record, Schema) and explain how a CSV row becomes one. (We meet them shallowly here; Phase 2 opens each one up.)

Every single cell of data in Clinker is one Value — a closed set of nine shapes (null, bool, integer, float, string, date, datetime, array, map):

clinker-record ·value.rs ·Value type @47d2e12
pub enum Value {
Null,
Bool(bool),
Integer(i64),
Float(f64),
String(FieldStr),
Date(NaiveDate),
DateTime(NaiveDateTime),
Array(Vec<Value>),
Map(Box<IndexMap<Box<str>, Value>>),
}

When the CSV source reads Alice’s row, it doesn’t guess types — it reads every field as a string (Value::String). Turning "15200" into a number is a later step (a transform’s job), not the reader’s. So right after reading, Alice is a row of strings.

A Record is one row: a list of Values lined up against a Schema that names the columns.

clinker-record ·mod.rs ·Record type @47d2e12

The real definition, trimmed to the part that matters now:

pub struct Record {
schema: Arc<Schema>, // the column names + order, shared by every row of a source
values: Vec<Value>, // the cells, positional — values[i] belongs to column i
// ... plus per-record scoped vars and document context, for later
}

Two ideas to take away: the cells are a plain Vec<Value> indexed by position, and the schema is held behind an Arc — a shared handle, so a million rows from one source all point at the same schema instead of each carrying their own copy. (Why Arc, and what that costs, is a Phase 2 question.)

The Schema is the list of column names and their order — it’s what lets values[5] mean “lifetime_value”:

clinker-record ·schema.rs ·Schema type @47d2e12
pub struct Schema {
columns: Vec<Box<str>>, // column names, in order
field_metadata: Vec<Option<FieldMetadata>>,
index: HashMap<Box<str>, usize>, // name -> position, for O(1) lookup
}

In customer_etl, the source declared the schema right in the YAML: customer_id, first_name, last_name, email, status, lifetime_value, zip_code. That’s the schema every customer record is bound to.

Here’s a record modeled in miniature — column names plus a Vec of values, exactly Alice’s row as the reader first sees it. Run it, then add the zip_code column and its value:

rust // editable

// quick check

A Record holds its cells as a Vec<Value>. How does it know which cell is which column?

You have the vocabulary of a record. Next: how the YAML pipeline you wrote becomes something the engine can actually run.