>_
EngineeringNotes
← Back to Important Concepts

Why B+ Trees for Databases?

From naive implementations to optimized B+ Trees. Understanding how storage evolution solved the physical constraints of disk I/O.

01

The Naive Implementation

The most simplistic way to store data is in a single file, where every entity – whether it's a Row (SQL) or a Document (NoSQL) – is placed sequentially one after another.

0x00Row 1|ID: 1, Data: "..."
0x64Row 2|ID: 2, Data: "..."
... next row will be appended here ...
02

The Performance Wall

In a transactional database, all basic operations become O(n) in a naive implementation due to the physical nature of Disk I/O.

Insert & Delete

You cannot "insert" or "delete" in the middle of a file. You must create a new file, copy rows up to that point, add/skip the row, and copy the rest.

Complexity: O(n)

Find & Update

Finding a row requires a linear scan. Updating a row is restricted by its width; if it grows, it infringes on the next row's space.

Complexity: O(n)

What about Range Queries?

Range queries are efficient once you find the first row, as you can just iterate one after another. However, finding the first row itself is still O(n), dragging down overall performance.

03

What is a B+ Tree?

A B+ Tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. Unlike a standard Binary Search Tree, it is highly branched (many children per node), which makes it ideal for storage systems.

Key Structural Rules

Internal Nodes

Only store Keys and Pointers. They act as a roadmap to guide the search.

Leaf Nodes

Store the Actual Data. All leaves are at the exact same depth (balanced).

Leaf Linkage

All leaf nodes are linked together (linked list) for fast range scans.

04

How Operations Work

1. Insertion (Splitting)

To insert a key, we find the correct leaf node. If the node has space, we insert. If it's full, we split it into two and move the middle key to the parent.

10 | 20 | 30
+ 25
20 (Parent)
10 | 20
25 | 30

2. Update

Update is a search operation followed by an in-place modification. Since row widths in databases are often fixed (schema-defined), we simply find the leaf and overwrite the data.

3. Deletion (Merging/Borrowing)

Removing a key might make a node "underflow" (too empty). In this case, the node either borrows a key from its sibling or merges with it to maintain balance.

10 (Underflow)
20 | 30 | 40
Borrow from sibling
10 | 20
30 | 40
05

Live B+ Tree Visualizer

Experience how the tree grows and balances itself. Try inserting keys like 10, 20, 30, 25 to see the root split, or delete keys to observe how it maintains structure.

Interactive Visualizer

Order: 3 (Max 2 keys per node)

Empty
Insertion Order
No entries yet
06

Serialization & Disk Blocks

To solve these problems, B+ Trees leverage the way Operating Systems handle hardware.

The 4KB Disk Block

The most granular width for Disk I/O is a Disk Block (typically 4KB). Even to read 1 byte, the OS reads the entire 4KB block.

B+ Tree Node (4KB)
Row
Row
Row
Row
Row
Row
Row
Row
Row
Row
Row
Row
Row
Row
Row
...
~100 Rows per Node

By matching the B+ Tree Node size to the Disk Block size, every disk read effectively brings in roughly 100 rows into memory at once.

07

Hierarchy & Routing

B+ Trees aren't just flat; they use Non-Leaf Nodesto hold "Routing Information."

ROOT (Routing Info)
ID: 1 - 200
ID: 201 - 400
Disk Offsets (Pointers)

A "Pointer" in a B+ Tree isn't a memory address – it's a disk offset in the database file. Routing nodes store these offsets to know exactly where the next block is physically located.

08

Predictable Performance

Because B+ Trees force all actual row data to stay in the Leaf Nodes, every "Find One" operation takes exactly the same number of disk reads (usually 3).

The Range Query Power

All leaf nodes are interlinked linearly. To find rows 100 to 600, you traverse down to the first leaf and then simply follow the pointers horizontally.

WiredTiger & MongoDB

NoSQL databases like MongoDB use this same logic. Their storage engine, WiredTiger, serializes collection data into B+ Trees to ensure high-performance persistence.