>_
EngineeringNotes
← Back to System Design

Scaling & Load Balancer

When your application blows up and thousands of users flood in, how do you prevent your server from melting down? Let's talk about Scaling.

What is Scaling?

Scaling is the ability of a system to handle increasing load - whether that's users, requests, or data volume.

Example: You built an API

10 Users Works fine
10,000 Users Slowing down
1,000,000 Users Crashes!
👉 To fix this, you must scale the system.

Types of Scaling

The two fundamental ways to expand your infrastructure.

Vertical Scaling (Scale UP)

Increase the power of a single machine by upgrading its core components.

Small Server Big Server 💪
4GB RAM32GB RAM
2 CPU Cores16 CPU Cores

Pros

  • • Simple to implement
  • • No distributed complexity
  • • Fast (no network latency)

Cons

  • • Strict hardware limits 🚫
  • • Exponentially expensive 💰
  • • Single Point of Failure (SPOF) 💥
📌 When to use: Early stage startups, small-scale systems, and prototyping.

Horizontal Scaling (Scale OUT)

Add more machines to your network instead of making a single one bigger.

Architecture Flow
UsersLoad Balancer
Server 1Server 2Server 3

Pros

  • • Highly scalable 📈
  • • Fault tolerant (No SPOF)
  • • Virtually no limits

Cons

  • • Complex implementation 😵
  • • Network latency overhead
  • • Data consistency hurdles
📌 When to use: Real-world systems (Netflix, Instagram), and high-traffic applications.
FeatureVertical (Scale UP)Horizontal (Scale OUT)
Scaling MethodBigger machineMore machines
Growth LimitHardware ceilingAlmost infinite
SpeedVery Fast (IPC)Slower (Network calls)
Failure HandlingSingle Point of Failure ❌Fault Tolerant ✅
ComplexitySimpleComplex

Load Balancers (LB) Deep Dive

The traffic cop of your architecture. Crucial for interviews.

What is it?

A Load Balancer sits between the clients (users) and your backend servers. It accepts all incoming network and application traffic and acts as a traffic cop, routing requests across all healthy servers capable of fulfilling them.

Why do we need it?

  • Scalability: Add/remove servers seamlessly.
  • Redundancy: If a server dies, LB stops sending traffic to it.
  • Performance: Prevents overload of any single server.

Layer 4 vs Layer 7 Load Balancing

Layer 4 (Transport)

Routing based purely on network-level information (IP addresses and TCP/UDP ports).

  • Fast & Efficient: No payload inspection required.
  • Dumb routing: Cannot route based on URL or cookie.

Layer 7 (Application)

Routing based on application-level contents (HTTP headers, URLs, cookies, payload).

  • Smart routing: Route `/api` to backend, `/images` to CDN.
  • Slower: Must decrypt and inspect the payload.

Common Routing Algorithms

Round Robin

Requests are distributed sequentially (Server 1, then 2, then 3, repeat). Best when servers are identical and tasks are equal length.

Least Connections

Sends traffic to the server with the fewest active connections. Best when tasks have varying processing times.

IP Hash (Sticky)

Hashes the client's IP to assign them to a specific server. Ensures the same user always hits the same server (Sticky Sessions).

5 Critical Concepts

The underlying fundamentals of distributed architectures you must understand.

1. Load Balancer

Distributes incoming requests intelligently across multiple servers. It prevents overload on any single server and routes around broken nodes.

User [LB] (S1 | S2 | S3)

2. Single Point of Failure

A component whose failure brings down the entire system.

  • Vertical Scaling has SPOF
  • Horizontal Scaling survives it

3. Network vs IPC

Why distributed systems are fundamentally hard.

  • Same machine: Interprocess Communication (IPC) via Memory/CPU is blazingly fast.
  • Different machines: Network calls (RPC: remote procedure calls) are slow and unreliable.

4. Data Consistency Problem

In horizontal scaling, data is spread across machines. If "User updates profile", which server holds the latest data?

Race conditionsDirty reads/writesSync delays

5. Hardware Limits

Vertical scaling hits a physical wall. You cannot buy a processor with infinite speed, nor unlimited RAM. Eventually, cost explodes exponentially.

🔥 The Real-World Approach

(VERY IMPORTANT)

Companies don't pick just one. They use a HYBRID approach.

StrategyStart with Vertical ScalingThen pivot to Horizontal

Production Architecture

Load Balancer
Big Server 💪Big Server 💪Big Server 💪

👉 Each machine is vertically strong + horizontally copied.

🎯 Key Design Goals

1. Scalability

Can the system seamlessly handle an influx of users?

2. Reliability / Resilience

Does it survive node crashes or catastrophic failures?

3. Consistency

Is the requested data correct and synced everywhere?

Most Important Insight

System Design = Trade-Offs

You absolutely CANNOT have it all. You cannot achieve:

  • Perfect Scalability
  • Perfect Consistency
  • Perfect Performance
👉 Your job is to balance them based on business needs.