Box & the heap — keeping a value small
The Rust question for this lesson: so far every value we’ve built has lived right where
we declared it — on the stack, inline. That’s fast, but it has a catch for enums: an
enum is always as wide as its largest variant, so one big variant makes every value
big, even the empty ones. How do you keep a rarely-used giant variant from taxing all the
small ones? You move it to the heap and leave a pointer behind — with Box, the first
and simplest smart pointer.
Box<T> is how a type like Clinker’s Value stays small even though one of its variants
holds an entire nested map.
Stack, heap, and the size of an enum
Section titled “Stack, heap, and the size of an enum”Most values live on the stack: a fixed-size slot the compiler reserves when the value comes into scope and reclaims when it leaves. To live on the stack, a type must have a size the compiler knows up front. The heap is the other place memory comes from — a pool you allocate from at run time for things whose size or lifetime doesn’t fit the stack’s strict in-and-out discipline.
Here’s the rule that makes this matter for enums: an enum occupies as much space as its
largest variant (plus a small tag saying which variant is active). It has to — any one
value of the type might be holding the biggest variant, so the slot must always be big
enough for it. A Value::Null takes exactly as many bytes as a Value::String.
So a single fat variant inflates the whole type. Watch it happen with our Shape:
use std::mem::size_of;
enum Shape { Point, // no payload Circle(f64), // 8 bytes Rectangle(f64, f64), // 16 bytes Polygon([f64; 32]), // 256 bytes — 16 (x, y) corners, stored INLINE}
size_of::<Shape>() // 264 — every Shape is sized for Polygon, even a PointAdding Polygon made Point 264 bytes too. If you stored a million Points, you’d pay
for a million 256-byte holes you never fill.
Box puts the big part on the heap
Section titled “Box puts the big part on the heap”Box<T> is an owning pointer: it allocates room for a T on the heap, stores the value
there, and the Box itself is just a pointer — 8 bytes, whatever T is. Box the
fat variant and the enum shrinks back down:
> output appears here — press Run
Same data, same variants — but boxing the one fat arm drops the type from 264 bytes to
24, an 11× cut. The big array still exists; it just lives on the heap now, allocated only
when you actually build a Polygon. A Point or Circle carries no heap allocation and no
256-byte hole.
A picture of the two layouts:
INLINE ── every Shape is 264 bytes, wherever it sits ── Shape::Point [ tag | ........ 256 bytes of unused Polygon room ........ ] Shape::Polygon [ tag | x0 y0 x1 y1 ........ all 32 floats, inline ........ ]
BOXED ── every Shape is 24 bytes; the big payload is off on the heap ── Shape::Point [ tag | (small) ] Shape::Polygon [ tag | ptr ]───────▶ heap: [ x0 y0 x1 y1 … 32 floats ]The compiler now needs only a pointer’s worth of room for the worst case, because the worst case lives elsewhere.
// quick check
Why does boxing just the Polygon variant make Shape::Point smaller too?
The enum's size is set by its biggest variant. Boxing Polygon turns its 256-byte payload into an 8-byte pointer, so the largest variant — and therefore every value of the type — shrinks.
The same move, in the engine: Value::Map
Section titled “The same move, in the engine: Value::Map”This is exactly the decision Clinker makes for its universal cell type. Value is the enum
every field in every record becomes — and one of its variants is a whole nested map:
clinker-record ·value.rs ·Value type @47d2e12
pub enum Value { Null, Bool(bool), Integer(i64), Float(f64), String(FieldStr), // 24-byte payload — sets the width Date(NaiveDate), DateTime(NaiveDateTime), Array(Vec<Value>), // 24-byte payload (ptr, len, cap) /// Nested key-value map (ordered, insertion-preserving). /// `Box<IndexMap>` is 8 bytes; the enum width is set by `String(...)` /// and `Array(Vec<Value>)`, not by this variant. Map(Box<IndexMap<Box<str>, Value>>), // boxed → 8 bytes, not the width-setter}The doc comment states the design outright: Box<IndexMap> is 8 bytes; the enum width is
set by String and Array, not by this variant. An IndexMap is several pointers wide —
far bigger than the 24-byte payloads of String and Array. If Map held one inline, it
would become the largest variant and drag every Value up to its size. Boxed, the map arm is
just an 8-byte pointer, so it never sets the width; the heap allocation happens only for the
(comparatively rare) cells that are actually maps.
The payoff is a fixed, small Value. The source pins it with a test:
clinker-record ·value.rs ·test_value_enum_size test @47d2e12
#[test]fn test_value_enum_size() { // … FieldStr is 24 bytes and exposes no spare niche, so the enum is 32 bytes … assert_eq!(std::mem::size_of::<Value>(), 32);}Value is 32 bytes: a 24-byte payload (set by String’s FieldStr, with Array the
same size) plus the variant tag, rounded to alignment. The Map variant rides along inside
that budget because it’s boxed.
Notice Box appears twice in that one line: Box<IndexMap<Box<str>, Value>>. The inner
Box<str> is the map’s key — a minimal owned string (just pointer + length, no spare
capacity) rather than a full String. That’s a different reason to reach for Box — trimming
a string down to its smallest owned form — and it gets its own treatment in the later strings
lesson. Here the headline is the outer one: the box that keeps Value small.
Your turn
Section titled “Your turn”A read-only exercise — use the playground above (or a scratch cargo project).
Model a miniature
Value: an enumCellwith a couple of small variants (Int(i64),Text(String)) and one “map-like” big variantMap([(i64, i64); 16])holding 16 pairs inline. Printsize_of::<Cell>(). Then change only the big variant toMap(Box<[(i64, i64); 16]>)and print the size again. Predict both numbers before you run, then explain — in terms of “largest variant sets the width” — why the second is so much smaller, and connect it to whyValue::Mapis boxed.
💡 Hint 1
[(i64, i64); 16] is 16 × 16 = 256 bytes inline, so the unboxed Cell is sized for that
(264 bytes) — even an Int. Boxing the big variant turns its payload into an 8-byte
pointer, so the largest variant becomes Text(String) and the whole enum drops to 24 bytes.
Same reasoning the engine applies to Map.
Show solution
use std::mem::size_of;
enum CellInline { Int(i64), Text(String), Map([(i64, i64); 16]), // 256 bytes inline → sets the width}
enum CellBoxed { Int(i64), Text(String), Map(Box<[(i64, i64); 16]>), // 8-byte pointer → no longer the width-setter}
fn main() { println!("inline = {} bytes", size_of::<CellInline>()); // 264 println!("boxed = {} bytes", size_of::<CellBoxed>()); // 24}Unboxed, every Cell — including a plain Int — is sized for the 256-byte Map arm.
Boxing that arm shrinks it to a pointer, so the largest remaining variant is Text(String)
and the enum collapses to 24 bytes. (The real Value lands at 32 rather than 24 because its
FieldStr payload has no spare niche for the tag to hide in, where String here does — but
the mechanism is identical.) That’s precisely the trade Value makes:
the nested map is real but uncommon, so it pays a one-time heap allocation when used
rather than making every one of the billions of cells carry its bulk.
Common misconceptions
Section titled “Common misconceptions”- “An enum only takes as much room as the variant it’s currently holding.” No — it’s
always sized for its largest possible variant, because any value might be holding it. A
Value::Nulloccupies the same 32 bytes as aValue::String. That’s the whole reason boxing the biggest variant shrinks every value of the type. - “
Boxis a performance cost to avoid.” For a large, infrequently-used variant it’s a win: it removes a big inline payload from every value, at the price of a single heap allocation and one pointer hop only when that variant is actually used. Avoiding theBoxhere would force every cell to carry the full map inline — far more expensive across a billion-row pipeline than the occasional allocation.
Verify it for real
Section titled “Verify it for real”Run the test that pins the layout, in the clinker checkout:
cargo test -p clinker-record test_value_enum_sizeIt asserts size_of::<Value>() == 32. Try the counterfactual in a scratch crate (never
edit clinker source): write your own size_of test on a mini-enum, box and un-box the big
variant, and watch the asserted number move — the same experiment the playground runs, now as
a real #[test].
Where this leads
Section titled “Where this leads”Box gives one owner a value on the heap. But Clinker constantly needs many owners to
share the same data — one Schema referenced by every record in a batch, across threads —
without copying it and without any single owner having to outlive the rest. That’s the next
smart pointer: Arc, shared ownership by reference count, and why the engine reaches for
Arc rather than Rc.