>_
EngineeringNotes
← Back to System Design

Caching

Mastering the art of storage-speed tradeoffs to build blazing fast distributed systems.

What is Caching?

"Caching is a fundamental concept of computer science. In fact, any system that you pick up—any large-scale distributed system—has some form of caching in multiple places and often in critical sections."

The core idea

The whole idea behind caching is reducing repeatable work through storage. Instead of doing the same competition again and again, you store it in local memory and give it back as a response.

Fast Storage > Slow Disk

Caches are closer to your system than databases. Querying a cache is much faster because you take your already stored result instead of recalling the database.

Latency Analysis

Imagine a user on Instagram asking for their news feed. The message reaches the server, which queries the database: SELECT * FROM posts WHERE user_id = ?.

Optimization Part 1: Client to Server Communication
Optimization Part 2: Server to Database Communication (Our Focus)

Scenario A

Without Cache

100ms
10ms
10ms
100ms
Total Latency220ms
Scenario B

With Cache

100ms
CACHE HIT
1ms
100ms
Total Latency201ms
🚀 ~10% Backend Savings

The Cohort Optimization

Similar users ask for similar feeds. For example, a young software engineer who likes football and is in India will get a news feed very similar to another user in that same cohort.

When one user from this cohort asks for a feed, you generate it from the database the first time and store it in local memory. The next time a similar user comes, instead of quering the database, you just take your already stored result and give it back.

Client-Side Magic

"Instead of making a 200 millisecond call, store it in your local device. You'll see the response time of the app go down from 200ms to 2ms."

Caching reduces repeatable work through storage.

Limitations & Strategy

Why don't we just put the entire database in the cache?

1. Physical & Cost Barriers

For small systems, fits GBS. But for large databases fitting terabytes or petabytes of data into memory is just impossible. While you might fit a TB in memory, it will be extremely expensive.

⚠️ "Actually, you start to optimize on the things that you store. You have to take a part of the database which is most frequently used."

2. Hit Rate Optimization

There is a distribution of how many people go to the cache vs database. Let's say 90% go to cache and 10% to DB. In the end, you will save less than 10% (around 2 or 4 milliseconds effectively).

"Your job as an engineer is to do a prediction and then store that in cache."

The Two Critical Questions

1. Write Policies

The cache is a copy. When there is an update, do you update the database and cache together, or later?

"We talk about all of the right policies in the upcoming lessons."

2. Eviction Policy

Your memory is full. A new viral video arrived. What data do you kick out?

LRULeast Recently Used (Gold Standard)
LFULeast Frequently Used
ML BasedMachine Learning Predictions

Critical Drawbacks

"I always find drawbacks interesting because when it comes to such a fundamental component it's almost inevitable... but we try to mitigate it."

Cache Trashing

Occurs when you iterate through data [1, 2, 3, 4] but your cache is only size 3. At 4, you evict 1. Then client asks for 1, you evict 2. This useless work increases latency and wastefully uses memory.

Poor hit rate = Wasteful Additional Computation

Eventual Consistency

The cache stores a stale copy of your YouTube likes. If updated every hour, the data is not true. This isDetermined by policy.

Fatal for Financial Transactions

Where to Place the Cache?

In-Memory Cache

A map/running logic along with your application. Extremely fast access.

DB-Level Cache

Queries cached by the database server itself in its internal memory.

Distributed Cache

An external global system shared by the rest of the services. Scalable and independent.

Best for Large Distributed Systems

Why Software Engineers Choose Distributed Cache:

  • Scales independently of your application servers.
  • Changes to cache algorithm don't require redeploying servers.
  • Multiple services can share the same logic and storage.
  • Independent deployments for better isolation.
Core Achievement

Saves Time

The Differentiator

Policy (LRU/LFU) Matters

Critical Decider

Placement Matters

Depending on your system, you need to make the right choices.