What is a record?

In Phase 0 you ran customer_etl and Clinker reported “5 records.” Phase 1 follows one of those records all the way through the engine. But first: what is a record? It’s the unit of data that flows through every pipeline, and it’s built from three vocabulary types you’ll see everywhere.

You’ll be able to: name the three types that make up a record (Value, Record, Schema) and explain how a CSV row becomes one. (We meet them shallowly here; Phase 2 opens each one up.)

A cell is a `Value`

Every single cell of data in Clinker is one Value — a closed set of nine shapes (null, bool, integer, float, string, date, datetime, array, map):

clinker-record ·value.rs ·Value type @47d2e12

pub enum Value {
    Null,
    Bool(bool),
    Integer(i64),
    Float(f64),
    String(FieldStr),
    Date(NaiveDate),
    DateTime(NaiveDateTime),
    Array(Vec<Value>),
    Map(Box<IndexMap<Box<str>, Value>>),
}

When the CSV source reads Alice’s row, it doesn’t guess types — it reads every field as a string (Value::String). Turning "15200" into a number is a later step (a transform’s job), not the reader’s. So right after reading, Alice is a row of strings.

A row is a `Record`

A Record is one row: a list of Values lined up against a Schema that names the columns.

clinker-record ·mod.rs ·Record type @47d2e12

The real definition, trimmed to the part that matters now:

pub struct Record {
    schema: Arc<Schema>,   // the column names + order, shared by every row of a source
    values: Vec<Value>,    // the cells, positional — values[i] belongs to column i
    // ... plus per-record scoped vars and document context, for later
}

Two ideas to take away: the cells are a plain Vec<Value> indexed by position, and the schema is held behind an Arc — a shared handle, so a million rows from one source all point at the same schema instead of each carrying their own copy. (Why Arc, and what that costs, is a Phase 2 question.)

The columns are a `Schema`

The Schema is the list of column names and their order — it’s what lets values[5] mean “lifetime_value”:

clinker-record ·schema.rs ·Schema type @47d2e12

pub struct Schema {
    columns: Vec<Box<str>>,           // column names, in order
    field_metadata: Vec<Option<FieldMetadata>>,
    index: HashMap<Box<str>, usize>,  // name -> position, for O(1) lookup
}

In customer_etl, the source declared the schema right in the YAML: customer_id, first_name, last_name, email, status, lifetime_value, zip_code. That’s the schema every customer record is bound to.

Build one yourself

Here’s a record modeled in miniature — column names plus a Vec of values, exactly Alice’s row as the reader first sees it. Run it, then add the zip_code column and its value:

rust // editable

#[derive(Debug)]
enum Value {
  Null,
  Integer(i64),
  Str(String), // the real engine uses an optimized string type — Phase 4
}

fn main() {
  // A schema: column names, in order.
  let schema = ["customer_id", "first_name", "status", "lifetime_value"];

  // A record: positional values, one per column. The CSV reader produces
  // strings — coercion to numbers happens later, in a transform.
  let alice: Vec<Value> = vec![
      Value::Str("1001".to_string()),
      Value::Str("Alice".to_string()),
      Value::Str("active".to_string()),
      Value::Str("15200".to_string()),
  ];

  for (column, value) in schema.iter().zip(&alice) {
      println!("{column:>15} = {value:?}");
  }
}

#[derive(Debug)]
enum Value {
  Null,
  Integer(i64),
  Str(String), // the real engine uses an optimized string type — Phase 4
}

fn main() {
  // A schema: column names, in order.
  let schema = ["customer_id", "first_name", "status", "lifetime_value"];

  // A record: positional values, one per column. The CSV reader produces
  // strings — coercion to numbers happens later, in a transform.
  let alice: Vec<Value> = vec![
      Value::Str("1001".to_string()),
      Value::Str("Alice".to_string()),
      Value::Str("active".to_string()),
      Value::Str("15200".to_string()),
  ];

  for (column, value) in schema.iter().zip(&alice) {
      println!("{column:>15} = {value:?}");
  }
}

> output appears here — press Run

// quick check

A Record holds its cells as a Vec<Value>. How does it know which cell is which column?

Cells are positional: values[i] belongs to the i-th column in the Arc. The schema is stored once and shared by every record of the source, not copied per row.

Checkpoint

✓ Checkpoint

// quick check

Right after the CSV source reads Alice's row, what is the type of her `lifetime_value` cell?

The reader makes every field a Value::String. Converting the string 15200 to a number is a transform's job (you'll watch it happen in lesson 1.4), not the reader's.

You should be able to:

You can name the three types: Value (a cell), Record (a row), Schema (the columns)
You can explain why a freshly-read CSV record is all Value::String
You read the real Record struct and found the Vec<Value> + Arc<Schema>

Verify in the checkout:

grep -nA4 'pub struct Record' crates/clinker-record/src/record/mod.rs
grep -n 'pub enum Value' crates/clinker-record/src/value.rs

You have the vocabulary of a record. Next: how the YAML pipeline you wrote becomes something the engine can actually run.