Skip to content

Box & the heap — keeping a value small

The Rust question for this lesson: so far every value we’ve built has lived right where we declared it — on the stack, inline. That’s fast, but it has a catch for enums: an enum is always as wide as its largest variant, so one big variant makes every value big, even the empty ones. How do you keep a rarely-used giant variant from taxing all the small ones? You move it to the heap and leave a pointer behind — with Box, the first and simplest smart pointer.

Box<T> is how a type like Clinker’s Value stays small even though one of its variants holds an entire nested map.

Most values live on the stack: a fixed-size slot the compiler reserves when the value comes into scope and reclaims when it leaves. To live on the stack, a type must have a size the compiler knows up front. The heap is the other place memory comes from — a pool you allocate from at run time for things whose size or lifetime doesn’t fit the stack’s strict in-and-out discipline.

Here’s the rule that makes this matter for enums: an enum occupies as much space as its largest variant (plus a small tag saying which variant is active). It has to — any one value of the type might be holding the biggest variant, so the slot must always be big enough for it. A Value::Null takes exactly as many bytes as a Value::String.

So a single fat variant inflates the whole type. Watch it happen with our Shape:

use std::mem::size_of;
enum Shape {
Point, // no payload
Circle(f64), // 8 bytes
Rectangle(f64, f64), // 16 bytes
Polygon([f64; 32]), // 256 bytes — 16 (x, y) corners, stored INLINE
}
size_of::<Shape>() // 264 — every Shape is sized for Polygon, even a Point

Adding Polygon made Point 264 bytes too. If you stored a million Points, you’d pay for a million 256-byte holes you never fill.

Box<T> is an owning pointer: it allocates room for a T on the heap, stores the value there, and the Box itself is just a pointer — 8 bytes, whatever T is. Box the fat variant and the enum shrinks back down:

rust // editable

Same data, same variants — but boxing the one fat arm drops the type from 264 bytes to 24, an 11× cut. The big array still exists; it just lives on the heap now, allocated only when you actually build a Polygon. A Point or Circle carries no heap allocation and no 256-byte hole.

A picture of the two layouts:

INLINE ── every Shape is 264 bytes, wherever it sits ──
Shape::Point [ tag | ........ 256 bytes of unused Polygon room ........ ]
Shape::Polygon [ tag | x0 y0 x1 y1 ........ all 32 floats, inline ........ ]
BOXED ── every Shape is 24 bytes; the big payload is off on the heap ──
Shape::Point [ tag | (small) ]
Shape::Polygon [ tag | ptr ]───────▶ heap: [ x0 y0 x1 y1 … 32 floats ]

The compiler now needs only a pointer’s worth of room for the worst case, because the worst case lives elsewhere.

// quick check

Why does boxing just the Polygon variant make Shape::Point smaller too?

This is exactly the decision Clinker makes for its universal cell type. Value is the enum every field in every record becomes — and one of its variants is a whole nested map:

clinker-record ·value.rs ·Value type @47d2e12
pub enum Value {
Null,
Bool(bool),
Integer(i64),
Float(f64),
String(FieldStr), // 24-byte payload — sets the width
Date(NaiveDate),
DateTime(NaiveDateTime),
Array(Vec<Value>), // 24-byte payload (ptr, len, cap)
/// Nested key-value map (ordered, insertion-preserving).
/// `Box<IndexMap>` is 8 bytes; the enum width is set by `String(...)`
/// and `Array(Vec<Value>)`, not by this variant.
Map(Box<IndexMap<Box<str>, Value>>), // boxed → 8 bytes, not the width-setter
}

The doc comment states the design outright: Box<IndexMap> is 8 bytes; the enum width is set by String and Array, not by this variant. An IndexMap is several pointers wide — far bigger than the 24-byte payloads of String and Array. If Map held one inline, it would become the largest variant and drag every Value up to its size. Boxed, the map arm is just an 8-byte pointer, so it never sets the width; the heap allocation happens only for the (comparatively rare) cells that are actually maps.

The payoff is a fixed, small Value. The source pins it with a test:

clinker-record ·value.rs ·test_value_enum_size test @47d2e12
#[test]
fn test_value_enum_size() {
// … FieldStr is 24 bytes and exposes no spare niche, so the enum is 32 bytes …
assert_eq!(std::mem::size_of::<Value>(), 32);
}

Value is 32 bytes: a 24-byte payload (set by String’s FieldStr, with Array the same size) plus the variant tag, rounded to alignment. The Map variant rides along inside that budget because it’s boxed.

Notice Box appears twice in that one line: Box<IndexMap<Box<str>, Value>>. The inner Box<str> is the map’s key — a minimal owned string (just pointer + length, no spare capacity) rather than a full String. That’s a different reason to reach for Box — trimming a string down to its smallest owned form — and it gets its own treatment in the later strings lesson. Here the headline is the outer one: the box that keeps Value small.

A read-only exercise — use the playground above (or a scratch cargo project).

Model a miniature Value: an enum Cell with a couple of small variants (Int(i64), Text(String)) and one “map-like” big variant Map([(i64, i64); 16]) holding 16 pairs inline. Print size_of::<Cell>(). Then change only the big variant to Map(Box<[(i64, i64); 16]>) and print the size again. Predict both numbers before you run, then explain — in terms of “largest variant sets the width” — why the second is so much smaller, and connect it to why Value::Map is boxed.

💡 Hint 1

[(i64, i64); 16] is 16 × 16 = 256 bytes inline, so the unboxed Cell is sized for that (264 bytes) — even an Int. Boxing the big variant turns its payload into an 8-byte pointer, so the largest variant becomes Text(String) and the whole enum drops to 24 bytes. Same reasoning the engine applies to Map.

Show solution
use std::mem::size_of;
enum CellInline {
Int(i64),
Text(String),
Map([(i64, i64); 16]), // 256 bytes inline → sets the width
}
enum CellBoxed {
Int(i64),
Text(String),
Map(Box<[(i64, i64); 16]>), // 8-byte pointer → no longer the width-setter
}
fn main() {
println!("inline = {} bytes", size_of::<CellInline>()); // 264
println!("boxed = {} bytes", size_of::<CellBoxed>()); // 24
}

Unboxed, every Cell — including a plain Int — is sized for the 256-byte Map arm. Boxing that arm shrinks it to a pointer, so the largest remaining variant is Text(String) and the enum collapses to 24 bytes. (The real Value lands at 32 rather than 24 because its FieldStr payload has no spare niche for the tag to hide in, where String here does — but the mechanism is identical.) That’s precisely the trade Value makes: the nested map is real but uncommon, so it pays a one-time heap allocation when used rather than making every one of the billions of cells carry its bulk.

  • “An enum only takes as much room as the variant it’s currently holding.” No — it’s always sized for its largest possible variant, because any value might be holding it. A Value::Null occupies the same 32 bytes as a Value::String. That’s the whole reason boxing the biggest variant shrinks every value of the type.
  • Box is a performance cost to avoid.” For a large, infrequently-used variant it’s a win: it removes a big inline payload from every value, at the price of a single heap allocation and one pointer hop only when that variant is actually used. Avoiding the Box here would force every cell to carry the full map inline — far more expensive across a billion-row pipeline than the occasional allocation.

Run the test that pins the layout, in the clinker checkout:

Terminal window
cargo test -p clinker-record test_value_enum_size

It asserts size_of::<Value>() == 32. Try the counterfactual in a scratch crate (never edit clinker source): write your own size_of test on a mini-enum, box and un-box the big variant, and watch the asserted number move — the same experiment the playground runs, now as a real #[test].

Box gives one owner a value on the heap. But Clinker constantly needs many owners to share the same data — one Schema referenced by every record in a batch, across threads — without copying it and without any single owner having to outlive the rest. That’s the next smart pointer: Arc, shared ownership by reference count, and why the engine reaches for Arc rather than Rc.