Arc — shared ownership across the engine
The Rust question for this lesson: Box (lesson 6) gave one owner a value on the
heap. But ownership has been strictly single-owner this whole track — and real programs
constantly need the same value owned by many places at once. Every record in a batch
needs the same schema. Every row from a file shares that file’s name. None of those owners
naturally outlives the others, and copying the value per owner would be wasteful. So how do
you give a value several owners and free it exactly when the last one is done? With
Arc — atomically reference-counted shared ownership.
Arc is the smart pointer Clinker reaches for everywhere a single value has to be shared
widely and cheaply — one schema behind a million records.
One value, many owners
Section titled “One value, many owners”Arc::new(value) puts value on the heap with a reference count attached. Every
Arc::clone hands out another owning handle and bumps that count by one; every handle that
drops lowers it by one. When the count hits zero — the last owner gone — the value is freed,
exactly once. Crucially, Arc::clone does not copy the value: it copies a pointer
and increments the counter. All handles point at the same heap allocation.
> output appears here — press Run
Arc::ptr_eq returning true is the proof: a is not a second Shape, it’s a second
owner of the one Shape. Three handles, one value, one eventual free.
// quick check
What does Arc::clone actually duplicate?
Arc::clone produces another owning handle to the SAME heap value and increments the reference count. The value itself is never duplicated — Arc::ptr_eq proves the handles alias one allocation.
The engine shares its schema this way
Section titled “The engine shares its schema this way”A Record doesn’t own a private Schema — it holds an Arc<Schema>, the same schema
shared by every record in the batch:
clinker-record ·mod.rs ·Record type @47d2e12
pub struct Record { schema: Arc<Schema>, // shared — one Schema behind every record in the batch values: Vec<Value>, // owned — this row's own cells // …}The doc comment on Record spells out the payoff for the document context it carries the
same way — records from one source file “share that file’s Arc (one allocation per
document, refcount-bump per record).” That is the whole idea in one line: allocate the big
shared thing once, then every record gets a cheap owning handle to it instead of a copy.
The same pattern carries a record’s provenance — where it came from:
clinker-record ·provenance.rs ·RecordProvenance type @47d2e12
/// Tracks where a record came from in the source data./// Arc<str> fields are shared across all records from the same file/run.pub struct RecordProvenance { pub source_file: Arc<str>, // the file name — one string, every row points to it pub source_row: u64, // this row's own number pub source_batch: Arc<str>, pub ingestion_timestamp: NaiveDateTime,}Arc<str> is a shared, immutable string — the file name "data/input.csv" is allocated
once and every row from that file holds a refcounted handle to it, not its own copy. The
factory that stamps each row makes this explicit:
pub fn factory(source_file: Arc<str>, source_batch: Arc<str>, /* … */) -> impl FnMut(u64) -> Self { move |source_row| Self { source_file: Arc::clone(&source_file), // refcount bump, not a string copy source_row, source_batch: Arc::clone(&source_batch), // … }}Across a million-row file that’s one "data/input.csv" allocation and a million cheap
refcount bumps — versus a million duplicated strings if these were plain Strings.
Clinker context
Section titled “Clinker context”A batch is millions of Records drawn from the same source, all describing themselves with
the same Schema and the same source-file string. Two non-Arc options both hurt: give each
record its own Schema/String and you copy big shared data millions of times; thread a
borrow (&Schema) through everything and you tie every record’s lifetime to one original
owner that must outlive them all — unworkable when records flow through a pipeline and get
buffered, reordered, and handed off. Arc is the third way: each record is a genuine,
independent owner of the shared value, and the value lives precisely until the last record
referencing it is gone.
A picture of what’s shared versus owned:
┌─ Record #0 ─┐ ┌─ Record #1 ─┐ ┌─ Record #2 ─┐ │ values: […] │ │ values: […] │ │ values: […] │ ← each owns its own cells │ schema: ●───┼─┐ │ schema: ●───┼─┐ │ schema: ●───┼─┐ └─────────────┘ │ └─────────────┘ │ └─────────────┘ │ └─────────┬────────┴────────────────┘ ▼ heap: Arc<Schema> { refcount: 3, … } ← allocated ONCE, freed when the last record dropsYour turn
Section titled “Your turn”Two halves — use the playground.
(a) In the Shape playground above, before each
println!ofstrong_count, write down what you expect the count to be, then run and check. Add a thirdArc::cloneand adropand predict the count after each.(b) Paste the thread version below and run it (it works). Then change every
ArctoRc(use std::rc::Rc;) and recompile. Read the error, and explain — inSendterms — why the engine’s worker-thread model forcesArc.
> output appears here — press Run
💡 Hint 1
thread::spawn requires its closure (and everything it captures) to be Send, so the data
can legally move onto another thread. Arc<Schema> is Send; Rc<Schema> is not. The error
reads roughly: Rc<Schema>` cannot be sent between threads safely.
Show solution
(a) Counts go 1 → 3 (after two clones) → back down as handles drop. Each Arc::clone is
+1, each drop is −1; the Shape is freed only when the count reaches 0.
(b) With Rc, the program fails to compile at thread::spawn with an error like
Rc<Schema>` cannot be sent between threads safely — because Rc is !Send. spawn
demands Send so the captured value can move to the worker thread. Arc is Send + Sync
(its refcount updates atomically), so it crosses the boundary safely. That’s the whole reason
Clinker uses Arc: a Record holding an Arc<Schema> rides onto the executor’s worker
thread, and Rc simply isn’t allowed there. Arc::clone copies no schema data either way —
just a pointer and a refcount bump — so sharing stays cheap.
Common misconceptions
Section titled “Common misconceptions”- “
Arc::clonedeep-copies the value.” No — it copies a pointer and atomically bumps a reference count; every handle aliases the one heap value (provable withArc::ptr_eq). Cloning anArc<Schema>does not duplicate the schema; that’s the point. - “
ArcandRcare interchangeable.” Only when you never leave one thread.Rcuses a non-atomic count and is!Send, so it’s faster but single-threaded;ArcisSend + Syncand pays a small atomic cost to be shared across threads. Clinker moves records across a worker-thread boundary, so it must beArc. - “An
Arclets every owner mutate the shared value.” No —Arc<T>gives shared read access; you can’t get an&mutthrough it while it’s shared. Mutating shared data needs interior mutability (aMutex/RwLock, or atomics) — a topic still ahead.
Verify it for real
Section titled “Verify it for real”The provenance test proves the sharing claim directly — clones of an Arc<str> point at one
allocation:
cargo test -p clinker-record test_provenance_arc_sharingIt builds two RecordProvenance rows from the same Arc<str> file name and asserts
Arc::ptr_eq(&p1.source_file, &p2.source_file) — same pointer, one string, two owners. For
the counting side, drop a dbg!(Arc::strong_count(&x)) into a scratch crate (never edit
clinker source) and watch it move as you clone and drop.
Where this leads
Section titled “Where this leads”That completes the ownership arc of the track: a value has one owner (lesson 4), you can
borrow it (5), put it on the heap (6), or share it among many owners with Arc (7). Next the
track turns from who holds a value to what a value can be: modeling success or failure
in the type system with Result and the ? operator — how Clinker distinguishes a bad row
it can route to a dead-letter queue from a fatal error that must stop the run.