Span-preserving parse & strictness
You write colmn: when you meant column:. What should happen? A lenient parser shrugs,
ignores the key it doesn’t recognise, and runs your pipeline with a silently-missing setting —
a bug you discover hours later in the output. Clinker takes the opposite stance: that typo is
an error, raised at plan time, pointing at the exact line. Two design choices make
that possible, and both are worth stealing.
You’ll be able to: explain why all YAML flows through one from_str chokepoint, what
deny_unknown_fields buys, and how Spanned<T> lets an error point back at the source line.
Choice one: every value remembers where it came from
Section titled “Choice one: every value remembers where it came from”When serde turns YAML into a Rust struct, it normally throws away where each value sat in the
file. That’s fine until you need to say “the problem is here.” So clinker parses into
Spanned<T> — a wrapper that keeps the value and its source location:
clinker-plan ·yaml.rs ·Spanned type @47d2e12
// re-exported by clinker from the serde-saphyr cratepub struct Spanned<T> { pub value: T, pub referenced: Location, // where this value is used in the source pub defined: Location, // where it was defined (e.g. a YAML anchor)}A Spanned<PipelineNode> is a pipeline node that still knows its line and column. The engine
holds the whole pipeline as Vec<Spanned<PipelineNode>> precisely so that any later
complaint — a bad reference, a security rejection, a type error — can be rendered as a
diagnostic that points at the source, the way a good compiler underlines the offending token
rather than saying “error somewhere in your file.”
This is the same philosophy as last lesson’s ValidatedPath: carry the extra fact in the
type rather than recomputing or guessing it later. There the fact was “screened”; here it’s
“came from line N.”
Choice two: unknown fields are rejected, not ignored
Section titled “Choice two: unknown fields are rejected, not ignored”The strictness lives on the config structs themselves, via a serde attribute:
clinker-plan ·source.rs ·deny_unknown_fields doc @47d2e12
#[derive(Deserialize)]#[serde(deny_unknown_fields)]pub struct WatermarkConfig { pub column: String, // ... a stray `colmn:` key here is a PARSE ERROR, not a silent no-op}#[serde(deny_unknown_fields)] flips serde from “ignore keys I don’t recognise” to “refuse any
key I don’t recognise.” That single attribute is what converts your colmn: typo from a silent
misconfiguration into a loud, located failure before the pipeline runs. The attribute appears
on config structs across the crate; it’s a house rule, not a one-off.
Choice three: one chokepoint for all of it
Section titled “Choice three: one chokepoint for all of it”Strictness and span-tracking only hold if nothing sneaks around them. So clinker funnels all YAML parsing through a single function. No other code is permitted to call the underlying parser directly:
clinker-plan ·yaml.rs ·from_str fn @47d2e12
pub fn from_str<'de, T>(yaml: &'de str) -> Result<T, YamlError>where T: Deserialize<'de>,{ if yaml.len() > MAX_INPUT_BYTES { // pre-parse rejection — cheap, before the parser sees a huge input return Err(YamlError(make_oversize_error(yaml.len()))); } serde_saphyr::from_str_with_options(yaml, budget_options()).map_err(YamlError)}The module doc states the rule plainly: “This module is the single entry point for YAML
parsing in clinker. No other code path is permitted to call serde_saphyr::from_str*.” Why
insist on one door? Two reasons, both architectural:
- Defence in depth.
budget_options()caps input size (32 MB), nesting depth (256), and node count (100 000), disables!include, and enforces an alias/anchor ratio against “billion laughs” expansion attacks. A single chokepoint means those limits apply to every parse — there’s no forgotten code path that parses unbounded input. - Bus-factor containment.
serde-saphyris a pre-1.0, single-maintainer dependency. Routing every call through one wrapper means that if it ever needs replacing, there’s exactly one file to change, not a hundred call sites scattered across the crate.
A chokepoint is the parser-side cousin of the proof token: instead of trusting every caller to remember the limits, you make the one gate enforce them for everyone.
Strict + located, in miniature
Section titled “Strict + located, in miniature”No serde needed to feel it. Here’s a tiny config reader that is both strict (rejects
unknown keys) and span-aware (knows the line), using only std:
> output appears here — press Run
The good config parses; the typo is rejected with line 2: unknown field \colmn`. Swap the if !allowed.contains(…)check for acontinue` and you’ve built the lenient parser — it
accepts the typo and quietly drops the setting. That one branch is the whole difference between
“fails loudly at plan time, here” and “fails mysteriously at run time, somewhere.”
// quick check
What does #[serde(deny_unknown_fields)] change about parsing a config?
deny_unknown_fields is the strictness lever: an unrecognised key becomes an error at parse time. (Remembering the source line is the separate job of Spanned.)
Trace the chokepoint
Section titled “Trace the chokepoint”Two lessons, two flavours of “make the type carry the guarantee”: a proof token for security,
spans-and-strictness for config. Both feed the same destination — the validated plan the
executor runs. Next we meet that plan itself, and the typed handle that says “this has been
fully compiled” the same unforgeable way ValidatedPath said “this was screened.”