Load Balancing & ASG
Distributing traffic and automating server management.
Load Balancing Basics
Context
As your app grows, you need to autoscale your app if traffic increases or decreases.
This requires two things:
- Increasing/Decreasing Compute (Servers)
- Distributing Traffic (Load Balancer)
When you scale horizontally, you need a "Traffic Cop" to distribute requests evenly across your servers. This is the Load Balancer.
Active
Active
Active
Common Algorithms
- Round Robin: go down the list one by one (A, then B, then C, then A...).
- Least Connections: send to the server with the fewest active users.
- IP Hash: always send the same user to the same server (Sticky Sessions).
Manual vs. Auto Scaling
Manual Load Balancer
You configure Nginx with a static list of IPs.
server 10.0.0.1;
server 10.0.0.2;
}
- ✗ If traffic spikes, you must manually launch a server and edit Nginx conf.
- ✗ If a server dies, Nginx keeps trying to send traffic until you remove it.
AWS Auto Scaling Group (ASG)
A "Group" that manages the list of servers for you.
- ✔Health Checks: Automatically kills dead servers and spawns new ones.
- ✔Dynamic Registry: The Load Balancer talks to the ASG to know which IPs are valid right now. No config edits needed.
Essentials: Cloud Terminology
Before we build an autoscaling group, we need to understand the AWS components that make it possible.
Elastic Compute Cloud. Just a virtual computer (Server).
Amazon Machine Image. A "snapshot" of your server. It contains the OS (Linux) + Your Code (Node.js app). The ASG uses this to create identical copies.
A configuration file that tells AWS: "When you launch a new server, use this AMI and this Security Group."
The entry point for all traffic. It listens on port 80/443 and forwards traffic to the Target Group.
A dynamic list of IP addresses (servers) that act as the destination for the Load Balancer.
The manager. It monitors health and metrics. If a server dies, it launches a new one using the Launch Template and registers it to the Target Group.
* Fully Managed: The Load Balancer is elastic and completely managed by AWS. We are assured that no issue will occur on the "Left Side". If any issue occurs, it will be on the "Right Side" (your computing instances).
Deep Dive: Auto Scaling Groups
What is an ASG?
An Auto Scaling Group (ASG) is a service that automatically adjusts the number of EC2 instances in a specified group to meet the demand for your application. It ensures you have the right amount of compute capacity by automatically scaling up or down.
Automatic Scaling
Scales your EC2 instances up or down based on predefined metrics like CPU Utilization or memory usage to maintain optimal performance.
Health Checks
ASG performs regular regular health checks. If an instance becomes unhealthy, it is automatically replaced with a new one.
Example: Process Crash
Scaling Policies
Define how scaling happens. Can be schedule-based (e.g., peak hours) or dynamic (real-time metrics).
Example: Black Friday
Launch Templates
Specifies the configuration of instances: AMI (OS + Code), Instance Type (t2.micro), Key Pair, and Security Groups.
Desired Capacity
You define Minimum, Maximum, and Desired counts. ASG ensures the group stays within these bounds.
Example: DDoS Protection
ELB Integration
New instances are automatically registered with the Elastic Load Balancer (ELB), ensuring immediate traffic distribution.
Why use Auto Scaling Groups?
- ✓Cost Efficiency: ASGs help reduce costs by ensuring you're only running as many instances as needed to handle the load.
- ✓High Availability: Automatically replacing unhealthy instances keeps your application running smoothly.
- ✓Flexibility: ASGs support both manual and dynamic scaling policies to cater to a wide range of use cases.
Step 1: Prerequisites (Template)
Create an instance which runs your app
- Start an AWS EC2 instance
- Create an Image (AMI)
- Once your application is running perfectly on the EC2 instance (dependencies installed, PM2 running), we need to take a Blueprint (Image) of it.
- An AMI (Amazon Machine Image) is a pre-configured template used to create a virtual server. It contains the OS, application software (your code + dependencies), and settings required to launch the server.
How to create it:- Go to EC2 Instances list.
- Select your running instance (checkbox).
- Click Actions → Image and templates → Create image
- Give it a name (e.g., "my-app-v1") and description.
- Click "Create image".
• Now we have create AMI of our running instance.
👍 One benefit of creating the AMI of an instance is that now If we create a new instance then rather than selecting a OS Image we can select our custom Image.
- Create Launch Template
Go to EC2 → Launch Templates → Create launch template.
1. Name & DescriptionGive it a clear name (e.g.,my-web-server-v1).2. Application and OS Images (AMI)Select "My AMIs" → "Owned by me" → Select the image we created in the previous step.3. Instance TypeChoose t2.micro (or your preferred type).4. Key PairOptional, but recommended for debugging (select your existing .pem file).5. Network Settings (Crucial!)Create a security group that allows:
- SSH (22) - for debugging
- TCP (3000 or 4000) - enable the Load Balancer to reach your app
Previously we blocked these ports, but now the LB needs them open to forward traffic!
6. Advanced Details (User Data)Since we are not using Docker/Kubernetes, this part is critical. Even though the AMI has our code, the machine doesn't know how to start it automatically.
In the User Data field, we write a shell script to "bootstrap" the server. It tells the instance:
Terminalbash#!/bin/bash cd ~/ASG export PATH=$PATH:/home/ubuntu/.nvm/versions/node/v22.14.0/bin/ npm install -g pm2 pm2 start --interpreter /home/ubuntu/.nvm/versions/node/v22.14.0/bin/bun /home/ubuntu/ASG/bin.ts* Without this, your new auto-scaled server will sit idle with the code doing nothing.
Architectural Checkpoint
Before we proceed, understanding why we need the next components (Target Group & Load Balancer) is crucial.
Scenario A: Internal Processing
e.g., Training ML models, processing background jobs
- You ONLY need the Auto Scaling Group.
- The ASG spins up instances to handle the workload.
- No one on the internet needs to access these servers directly.
Scenario B: Public Application
e.g., Our Web App, API Server
- We REQUIRE a Load Balancer & Target Group.
- The ASG creates servers, but users need a single URL to visit.
- The Load Balancer provides that URL and distributes traffic to the ASG instances.
Understanding the Architecture
You might wonder: "Why do we need a Target Group? Why doesn't the Load Balancer just talk directly to the instances?"
AWS decouples these for flexibility. The Load Balancer acts as the "Receptionist" (Entry Point), while the Target Group acts as the "Manager" (Distributor).
- Load Balancer (Entry): Users hit this URL. It doesn't know who will do the work, just where to send the request (to the Target Group).
- Target Group (Logic): Keeps track of all healthy instances (M1, M2, etc.). It receives the request and picks the best machine to handle it.
Why is this necessary?
If we have 10,000 users, a single machine can't handle them. We spin up multiple machines (M1, M2, M3...). The Load Balancer + Target Group ensures no single machine is overwhelmed by distributing traffic evenly. (For this we may need to create more then one target group).
Step 2: Create Target Group
Configure the "Manager" (Target Group)
Go to EC2 → Load Balancing → Target Groups → Create target group.
Once the basic configuration (Protocol/Port/VPC) is created, you cannot change it. double-check everything before clicking Next.
- Choose target type: Select Instances.
- Target group name: Give it a clear name (e.g.,
my-web-app-tg). - Protocol & Port:HTTP : 4000(Or 3000, matching your app's port in the User Data script)
- IP address type / Protocol version: Keep defaults (IPv4, HTTP1).
- Health check (Important!)
Path:
/api/healthcheck(or just/).* The Load Balancer pings this route to check if the server is "alive". If it fails, it stops sending traffic to that instance.
- Click Next.
- Register targets (Skip this!)
Do NOT register any instances here.
Why? The Auto Scaling Group is responsible for registering/deregistering instances automatically. We only created our manual instance to make the AMI; we don't want the Load Balancer to send traffic to it permanently.
- Click Create target group.
Step 3: Configure Auto Scaling Group
Launch the Fleet
Go to EC2 → Auto Scaling Groups → Create Auto Scaling group.
- Step 1: Choose Launch Template
- Name: (Auto Scaling Group Name)
my-web-app-asg - Launch Template: Select the template we created (
my-web-server-v1). - Click Next.
- Name: (Auto Scaling Group Name)
- Step 2: Network
- VPC: Default VPC.
- Availability Zones: Select all available zones (e.g.,
us-east-1a,us-east-1b...). This ensures high availability. - Click Next.
Note:
AWS distributes instances across Availability Zones mainly to ensure high availability, fault tolerance, and resilience. If one Availability Zone fails, traffic and compute capacity automatically shift to other zones, preventing full system outage. (e.g: If we have 10 instances running and 2 AZs are selected, then 5 instances will be in us-east-1a and 5 instances will be in us-east-1b).
This is ok for the HTTP servers. However, for long-running or stateful workloads like video transcoding, autoscaling and instance replacement can cause issues because instances may terminate before job completion. Therefore, such workloads usually require job checkpointing, queues (SQS), or container orchestration instead of relying purely on EC2 autoscaling. - Step 3: Load Balancing (Magic Step ✨)
Check "Attach to a new load balancer".
AWS provides the functionality to create the Load Balancer right here, so we don't need to create it manually later!
- Load Balancer Type: Application Load Balancer
- Load Balancer Name:
my-web-app-lb - Scheme: Internet-facing
- Listeners and Routing:
- Protocol: HTTP / Port 80
- target group: Select existing load balancer target group →
my-web-app-tg(HTTP: 4000)
* Crucial: We map the LB (Port 80) to our Target Group (Port 4000). - Step 4: Group Size
- Desired capacity: 1 (Start with 1 server)
- Minimum capacity: 1
- Maximum capacity: 3 (Allow expansion up to 3 servers)
- Review & Create
Skip the rest (Notifications/Tags) for now and click Create Auto Scaling group.
➤ Now we can see the instance from ec2 dashboard directly or for specific ASG we can see it from ASG dashboard (Instance Management).
Troubleshooting: App Not Starting?
Debugging User Data Scripts
A common issue: Your instance says Running in the AWS Console, but your website is unreachable. This usually means the User Data script failed (e.g., typo in npm install, node version mismatch).
User Data scripts run as root on the very first boot. If they fail, the instance keeps running, but your app never starts. AWS does not show these errors in the standard console view.
How to verify execution:
- SSH into the instance using the Key Pair you selected in the Launch Template.ssh -i "my-key.pem" ubuntu@<public-ip>
- Check the logs. Cloud-init (the system that runs User Data) logs everything to specific files.
View the output log (stdout/stderr):
Terminalbashcat /var/log/cloud-init-output.logView the detailed operational log:
Terminalbashcat /var/log/cloud-init.log
You can see all logs of your user data script, and it will also create the /etc/cloud folder.
How to fix it?
Found the error? Great! Now we need to update the User Data script.
- Go to EC2 → Launch Templates.
- Select your template → Actions → Modify template (Create new version).
- Scroll down to Advanced details → User Data.
- Correct your script (e.g., fix the typo) and click Create template version.
Next: Apply the Fix
Creating a new template version doesn't automatically update running instances. You must tell the ASG to use the new version.
- Go to Auto Scaling Groups and select your group.
- Click Edit on the Launch Template section.
- Change the Version to
Latest(so it starts using your fix). - Click Update.
The old instances are still running the broken code. You need to replace them.
Go to Instance Management tab, select the old instance, and click Terminate. The ASG will notice the count dropped and automatically launch a new one (using the new version).
- Edit ASG → Set Desired Capacity to
0. - Wait for the instance to disappear.
- Edit ASG → Set Desired Capacity back to
1. - ASG launches a fresh instance.
Achieving True Auto Scaling
From Manual to Automatic
Right now, our ASG has a static Desired Capacity. If traffic spikes, we have to manually login and increase it. That's not "Auto" Scaling. Let's automate this based on CPU Usage.
Step 4: Configure Dynamic Scaling Policies
- Go to Auto Scaling Groups → Select your ASG.
- Click on the Automatic scaling tab.
- Scroll down to Dynamic scaling policies.
- Click Create dynamic scaling policy.
AWS recommends Target Tracking Scaling. It works like a thermostat: you set a target (e.g., 50% CPU), and AWS adjusts the capacity to maintain that level.
To see the magic happen, we need to artificially increase CPU load on our instance.
- SSH into your running instance.
- Install the `stress` tool:Terminalbash
sudo apt-get install stress -y - Run a stress test (simulate high load):Terminalbash
stress --cpu 4 --timeout 300* This forces 4 CPU cores to work at 100% for 5 minutes.
- Watch the Console:
- Go to the Monitoring tab in ASG.
- You will see CPU Utilization spike.
- After a few minutes, check Activity History. You will see a new entry: "Launching a new EC2 instance to increase capacity."