When your application blows up and thousands of users flood in, how do you prevent your server from melting down? Let's talk about Scaling.
Scaling is the ability of a system to handle increasing load - whether that's users, requests, or data volume.
The two fundamental ways to expand your infrastructure.
Increase the power of a single machine by upgrading its core components.
Add more machines to your network instead of making a single one bigger.
| Feature | Vertical (Scale UP) | Horizontal (Scale OUT) |
|---|---|---|
| Scaling Method | Bigger machine | More machines |
| Growth Limit | Hardware ceiling | Almost infinite |
| Speed | Very Fast (IPC) | Slower (Network calls) |
| Failure Handling | Single Point of Failure ❌ | Fault Tolerant ✅ |
| Complexity | Simple | Complex |
The traffic cop of your architecture. Crucial for interviews.
A Load Balancer sits between the clients (users) and your backend servers. It accepts all incoming network and application traffic and acts as a traffic cop, routing requests across all healthy servers capable of fulfilling them.
Routing based purely on network-level information (IP addresses and TCP/UDP ports).
Routing based on application-level contents (HTTP headers, URLs, cookies, payload).
Requests are distributed sequentially (Server 1, then 2, then 3, repeat). Best when servers are identical and tasks are equal length.
Sends traffic to the server with the fewest active connections. Best when tasks have varying processing times.
Hashes the client's IP to assign them to a specific server. Ensures the same user always hits the same server (Sticky Sessions).
The underlying fundamentals of distributed architectures you must understand.
Distributes incoming requests intelligently across multiple servers. It prevents overload on any single server and routes around broken nodes.
A component whose failure brings down the entire system.
Why distributed systems are fundamentally hard.
In horizontal scaling, data is spread across machines. If "User updates profile", which server holds the latest data?
Vertical scaling hits a physical wall. You cannot buy a processor with infinite speed, nor unlimited RAM. Eventually, cost explodes exponentially.
(VERY IMPORTANT)
Companies don't pick just one. They use a HYBRID approach.
👉 Each machine is vertically strong + horizontally copied.
Can the system seamlessly handle an influx of users?
Does it survive node crashes or catastrophic failures?
Is the requested data correct and synced everywhere?
System Design = Trade-Offs
You absolutely CANNOT have it all. You cannot achieve: