Mastering the art of storage-speed tradeoffs to build blazing fast distributed systems.
"Caching is a fundamental concept of computer science. In fact, any system that you pick up—any large-scale distributed system—has some form of caching in multiple places and often in critical sections."
The whole idea behind caching is reducing repeatable work through storage. Instead of doing the same competition again and again, you store it in local memory and give it back as a response.
Caches are closer to your system than databases. Querying a cache is much faster because you take your already stored result instead of recalling the database.
Imagine a user on Instagram asking for their news feed. The message reaches the server, which queries the database: SELECT * FROM posts WHERE user_id = ?.
Optimization Part 1: Client to Server Communication
Optimization Part 2: Server to Database Communication (Our Focus)
Similar users ask for similar feeds. For example, a young software engineer who likes football and is in India will get a news feed very similar to another user in that same cohort.
When one user from this cohort asks for a feed, you generate it from the database the first time and store it in local memory. The next time a similar user comes, instead of quering the database, you just take your already stored result and give it back.
"Instead of making a 200 millisecond call, store it in your local device. You'll see the response time of the app go down from 200ms to 2ms."
Why don't we just put the entire database in the cache?
For small systems, fits GBS. But for large databases fitting terabytes or petabytes of data into memory is just impossible. While you might fit a TB in memory, it will be extremely expensive.
There is a distribution of how many people go to the cache vs database. Let's say 90% go to cache and 10% to DB. In the end, you will save less than 10% (around 2 or 4 milliseconds effectively).
The cache is a copy. When there is an update, do you update the database and cache together, or later?
Your memory is full. A new viral video arrived. What data do you kick out?
"I always find drawbacks interesting because when it comes to such a fundamental component it's almost inevitable... but we try to mitigate it."
Occurs when you iterate through data [1, 2, 3, 4] but your cache is only size 3. At 4, you evict 1. Then client asks for 1, you evict 2. This useless work increases latency and wastefully uses memory.
The cache stores a stale copy of your YouTube likes. If updated every hour, the data is not true. This isDetermined by policy.
A map/running logic along with your application. Extremely fast access.
Queries cached by the database server itself in its internal memory.
An external global system shared by the rest of the services. Scalable and independent.
Saves Time
Policy (LRU/LFU) Matters
Placement Matters
Depending on your system, you need to make the right choices.