System design is the process of planning and defining how a software system will work internally.
Defining a system architecture, components, interfaces, and data flows to meet specific functional and non-functional requirements.
Creating a robust blueprint for software systems that ensures Scalability, Performance, Reliability, and Efficiency.
Handle 1 user to 10 Crore users efficiently
System Design is broadly categorized into two phases: High-Level Design (HLD) and Low-Level Design (LLD).
"Bird's eye view of the system"
Users send requests Load Balancer Backend DB & Object Storage + CDN.
👉 No class, no code — just flow.
Handle millions of users • Ensure scalability & reliability
"Zoomed-in internal view"
class User, Post, FeedService
DB Tables: users, posts, followers
POST /createPost, GET /feed
👉 Close to actual coding.
Make system implementable • Ensure clean, maintainable code
| Feature | High-Level Design (HLD) | Low-Level Design (LLD) |
|---|---|---|
| Focus | System architecture | Code-level design |
| Level | Abstract | Detailed |
| Concern | Scalability | Implementation |
| Includes | Services, DB, APIs | Classes, methods |
| Used by | Architects | Developers |
Mastering these principles makes a system reliable, scalable, and maintainable.
Increase the capacity of a single resource.
Instead of hiring another chef, you give your current chef better tools, a faster oven, or caffeine to handle more orders.
Upgrading a single server with a more powerful CPU, more RAM, or a faster SSD to handle increased traffic.
Performing tasks during off-peak hours to save resources for busy times (via cron jobs).
Pre-making pizza bases or chopping veggies during off-peak hours so you're ready when the dinner rush hits.
Running background tasks (like generating daily reports or updating caches) during off-peak hours to reduce latency during busy periods.
Introducing redundancy to eliminate single points of failure.
Having a spare, identical oven ready to be fired up immediately if the primary oven breaks down.
Implementing redundancy. If your primary database or server crashes, a standby 'slave' or secondary unit automatically takes over to prevent a single point of failure.
Adding more resources or machines to handle increased demands.
Instead of one super-fast chef, you hire 10 regular chefs. You can easily add more chefs as demand increases.
Instead of one powerful server, you deploy multiple smaller servers. This allows you to scale by simply adding more nodes to the network.
Dividing the system into specialized, manageable units where each component has a specific responsibility.
Breaking down the kitchen into specialized stations: one chef just prepares dough, one does toppings, and one bakes. Each station operates independently.
Breaking a monolithic application into smaller, specialized services (e.g., Auth, Payment, Profile). Each can be scaled or updated independently.
Spreading the system across different locations to improve fault tolerance and response time.
Opening multiple pizza branches across the city so a customer gets their pizza hot from the nearest branch.
Spreading your service across geographical locations (e.g., AWS regions). Ensures a user in Tokyo accesses data from a server near them, reducing latency and increasing fault tolerance.
Using a central authority to route requests efficiently between resources based on real-time data.
A skilled manager at the front taking orders and intelligently assigning them to the cook who currently has the smallest backlog.
A 'traffic cop' (like Nginx or AWS ELB) sits in front of your servers, routing incoming user requests to the least busy server to optimize response times.
Separating different parts of a system so they can operate and evolve independently.
The cashier writes the order on a ticket and puts it on a rail. They don't need to yell at or wait for the chef; they just fire the order and take the next customer.
Using message queues (Kafka, RabbitMQ) to ensure the 'Ordering' service doesn't need to know how the 'Shipping' service works. They communicate via events, keeping the system flexible.
Tracking system events to condense data and identify performance bottlenecks.
Keeping a detailed ledger of what time orders were placed, how long they took, and tracking who was working when the oven malfunctioned.
Implementing tools like Prometheus or ELK stack to monitor system health. If a server's 'oven' (database) is slow, you can identify the bottleneck immediately through logs.
Design the system to be flexible so it can adapt to new business requirements without needing a total rewrite.
Designing your kitchen layout so that you can easily add a new deep fryer for wings later without having to rebuild the entire building.
Writing modular code so that adding a new feature (like supporting a new payment method) doesn't require rewriting the codebase, ensuring the system grows with business needs.