Module 09 - Chapter 02

Load Balancing & ASG

Distributing traffic and automating server management.

Load Balancing Basics

Context

As your app grows, you need to autoscale your app if traffic increases or decreases.

This requires two things:

Increasing/Decreasing Compute (Servers)
Distributing Traffic (Load Balancer)

When you scale horizontally, you need a "Traffic Cop" to distribute requests evenly across your servers. This is the Load Balancer.

Client Requests

Load Balancer (e.g., Nginx, AWS ALB)

Server A
Active

Server B
Active

Server C
Active

Common Algorithms

Round Robin: go down the list one by one (A, then B, then C, then A...).
Least Connections: send to the server with the fewest active users.
IP Hash: always send the same user to the same server (Sticky Sessions).

Manual vs. Auto Scaling

Manual Load Balancer

You configure Nginx with a static list of IPs.

upstream backend {
server 10.0.0.1;
server 10.0.0.2;
}

✗ If traffic spikes, you must manually launch a server and edit Nginx conf.
✗ If a server dies, Nginx keeps trying to send traffic until you remove it.

AWS Auto Scaling Group (ASG)

A "Group" that manages the list of servers for you.

✔Health Checks: Automatically kills dead servers and spawns new ones.
✔Dynamic Registry: The Load Balancer talks to the ASG to know which IPs are valid right now. No config edits needed.

AWS ASG Architecture

Load Balancer

Auto Scaling Group

22.11.22.23

:3000

22.11.22.24

:3000

22.11.22.25

:3000

22.11.22.26

:3000

Note

Usage of Sticky Sessions (WebSockets) requires careful LB config!

Essentials: Cloud Terminology

Before we build an autoscaling group, we need to understand the AWS components that make it possible.

EC2

Elastic Compute Cloud. Just a virtual computer (Server).

Image (AMI)

Amazon Machine Image.A "snapshot" of your server. It contains the OS (Linux) + Your Code (Node.js app). The ASG uses this to create identical copies.

ConfigLaunch Template

A configuration file that tells AWS: "When you launch a new server, use this AMI and this Security Group."

Load Balancer

The entry point for all traffic. It listens on port 80/443 and forwards traffic to the Target Group.

Target Group

A dynamic list of IP addresses (servers) that act as the destination for the Load Balancer.

Auto Scaling Group

The manager. It monitors health and metrics. If a server dies, it launches a new one using the Launch Template and registers it to the Target Group.

* Fully Managed:The Load Balancer is elastic and completely managed by AWS. We are assured that no issue will occur on the "Left Side". If any issue occurs, it will be on the "Right Side" (your computing instances).

Deep Dive: Auto Scaling Groups

What is an ASG?

An Auto Scaling Group (ASG) is a service that automatically adjusts the number of EC2 instances in a specified group to meet the demand for your application. It ensures you have the right amount of compute capacity by automatically scaling up or down.

Automatic Scaling

Scales your EC2 instances up or down based on predefined metrics like CPU Utilization or memory usage to maintain optimal performance.

Health Checks

ASG performs regular regular health checks. If an instance becomes unhealthy, it is automatically replaced with a new one.

Example: Process Crash

If your Node.js app crashes on Server A, the health check fails. ASG notices this, terminates Server A, and starts a fresh Server A (Replacement) automatically.

Scaling Policies

Define how scaling happens. Can be schedule-based (e.g., peak hours) or dynamic (real-time metrics).

Example: Black Friday

Schedule Policy:"On Nov 29th at 8:00 AM, increase Desired Capacity to 50 servers." This prepares your infra before the traffic hits.

Launch Templates

Specifies the configuration of instances: AMI (OS + Code), Instance Type (t2.micro), Key Pair, and Security Groups.

Desired Capacity

You define Minimum, Maximum, and Desired counts. ASG ensures the group stays within these bounds.

Example: DDoS Protection

You MUST set a Maximum Capacity. If you get DDoS'd, ASG might keep adding servers to handle the "traffic". Setting Max=10 ensures your bill doesn't skyrocket infinitely.

ELB Integration

New instances are automatically registered with the Elastic Load Balancer (ELB), ensuring immediate traffic distribution.

Why use Auto Scaling Groups?

✓
Cost Efficiency:ASGs help reduce costs by ensuring you're only running as many instances as needed to handle the load.
✓
High Availability: Automatically replacing unhealthy instances keeps your application running smoothly.
✓
Flexibility: ASGs support both manual and dynamic scaling policies to cater to a wide range of use cases.

Step 1: Prerequisites (Template)

Create an instance which runs your app

Start an AWS EC2 instance
Go to Guide: Creating your First EC2
Create an Image (AMI)
- Once your application is running perfectly on the EC2 instance (dependencies installed, PM2 running), we need to take a Blueprint (Image) of it.
- An AMI (Amazon Machine Image) is a pre-configured template used to create a virtual server. It contains the OS, application software (your code + dependencies), and settings required to launch the server.
How to create it:
1. Go to EC2 Instances list.
2. Select your running instance (checkbox).
3. Click Actions → Image and templates → Create image
4. Give it a name (e.g., "my-app-v1") and description.
5. Click "Create image".
• Now we have create AMI of our running instance.
👍 One benefit of creating the AMI of an instance is that now If we create a new instance then rather than selecting a OS Image we can select our custom Image.
Create Launch Template
Go to EC2 → Launch Templates → Create launch template.
1. Name & DescriptionGive it a clear name (e.g., my-web-server-v1).
2. Application and OS Images (AMI)Select "My AMIs" → "Owned by me"→ Select the image we created in the previous step.
3. Instance TypeChoose t2.micro (or your preferred type).
4. Key PairOptional, but recommended for debugging (select your existing .pem file).
5. Network Settings (Crucial!)
Create a security group that allows:
- SSH (22) - for debugging
- TCP (3000 or 4000) - enable the Load Balancer to reach your app
Previously we blocked these ports, but now the LB needs them open to forward traffic!
6. Advanced Details (User Data)
Since we are not using Docker/Kubernetes, this part is critical. Even though the AMI has our code, the machine doesn't know how to start it automatically.
In the User Datafield, we write a shell script to "bootstrap" the server. It tells the instance:
Terminal
bash
#!/bin/bash cd ~/ASG export PATH=$PATH:/home/ubuntu/.nvm/versions/node/v22.14.0/bin/ npm install -g pm2 pm2 start --interpreter /home/ubuntu/.nvm/versions/node/v22.14.0/bin/bun /home/ubuntu/ASG/bin.ts
* Without this, your new auto-scaled server will sit idle with the code doing nothing.

Architectural Checkpoint

Before we proceed, understanding why we need the next components (Target Group & Load Balancer) is crucial.

Scenario A: Internal Processing

e.g., Training ML models, processing background jobs

You ONLY need the Auto Scaling Group.
The ASG spins up instances to handle the workload.
No one on the internet needs to access these servers directly.

Scenario B: Public Application

e.g., Our Web App, API Server

We REQUIRE a Load Balancer & Target Group.
The ASG creates servers, but users need a single URL to visit.
The Load Balancer provides that URL and distributes traffic to the ASG instances.

Since we are building a Public Web App, we will proceed to create the Target Group and Load Balancer next.

Understanding the Architecture

You might wonder: "Why do we need a Target Group? Why doesn't the Load Balancer just talk directly to the instances?"

AWS decouples these for flexibility. The Load Balanceracts as the "Receptionist" (Entry Point), while the Target Groupacts as the "Manager" (Distributor).

Load Balancer (Entry):Users hit this URL. It doesn't know who will do the work, just where to send the request (to the Target Group).
Target Group (Logic): Keeps track of all healthy instances (M1, M2, etc.). It receives the request and picks the best machine to handle it.

User Request↓

Load Balancer

↓ (Forward Rule)

Target Group

↓

M1 (4000)

M2 (4000)

Why is this necessary?
If we have 10,000 users, a single machine can't handle them. We spin up multiple machines (M1, M2, M3...). The Load Balancer + Target Group ensures no single machine is overwhelmed by distributing traffic evenly. (For this we may need to create more then one target group).

Step 2: Create Target Group

Configure the "Manager" (Target Group)

Go to EC2 → Load Balancing → Target Groups → Create target group.

CRITICAL WARNING

Once the basic configuration (Protocol/Port/VPC) is created, you cannot change it. double-check everything before clicking Next.

Choose target type: Select Instances.
Target group name: Give it a clear name (e.g., my-web-app-tg).
Protocol & Port:HTTP : 4000(Or 3000, matching your app's port in the User Data script)
IP address type / Protocol version: Keep defaults (IPv4, HTTP1).
Health check (Important!)
Path: /api/healthcheck (or just /).
* The Load Balancer pings this route to check if the server is "alive". If it fails, it stops sending traffic to that instance.
Click Next.
Register targets (Skip this!)
Do NOT register any instances here.
Why? The Auto Scaling Groupis responsible for registering/deregistering instances automatically. We only created our manual instance to make the AMI; we don't want the Load Balancer to send traffic to it permanently.
Click Create target group.

✓ Target Group Created. Currently, it has 0 healthy targets (because we skipped registration), which is expected!

Step 3: Configure Auto Scaling Group

Launch the Fleet

Go to EC2 → Auto Scaling Groups → Create Auto Scaling group.

Step 1: Choose Launch Template
- Name: (Auto Scaling Group Name) my-web-app-asg
- Launch Template: Select the template we created (my-web-server-v1).
- Click Next.
Step 2: Network
- VPC: Default VPC.
- Availability Zones: Select all available zones (e.g., us-east-1a, us-east-1b...). This ensures high availability.
- Click Next.
Note:
AWS distributes instances across Availability Zones mainly to ensure high availability, fault tolerance, and resilience. If one Availability Zone fails, traffic and compute capacity automatically shift to other zones, preventing full system outage. (e.g: If we have 10 instances running and 2 AZs are selected, then 5 instances will be in us-east-1a and 5 instances will be in us-east-1b).
This is ok for the HTTP servers. However, for long-running or stateful workloads like video transcoding, autoscaling and instance replacement can cause issues because instances may terminate before job completion. Therefore, such workloads usually require job checkpointing, queues (SQS), or container orchestration instead of relying purely on EC2 autoscaling.
Step 3: Load Balancing (Magic Step ✨)
Check "Attach to a new load balancer".
AWS provides the functionality to create the Load Balancer right here, so we don't need to create it manually later!
- Load Balancer Type: Application Load Balancer
- Load Balancer Name: my-web-app-lb
- Scheme: Internet-facing
- Listeners and Routing:
  - Protocol: HTTP / Port 80
  - target group: Select existing load balancer target group → my-web-app-tg(HTTP: 4000)
* Crucial: We map the LB (Port 80) to our Target Group (Port 4000).
Step 4: Group Size
- Desired capacity: 1 (Start with 1 server)
- Minimum capacity: 1
- Maximum capacity: 3 (Allow expansion up to 3 servers)
Review & Create
Skip the rest (Notifications/Tags) for now and click Create Auto Scaling group.

🎉 Done! AWS is now spinning up your first instance. The Load Balancer is active at its DNS URL, forwarding traffic to port 4000 on your instances.

➤ Now we can see the instance from ec2 dashboard directly or for specific ASG we can see it from ASG dashboard (Instance Management).

Troubleshooting: App Not Starting?

Debugging User Data Scripts

A common issue: Your instance says Running in the AWS Console, but your website is unreachable. This usually means the User Data script failed (e.g., typo in npm install, node version mismatch).

User Data scripts run as root on the very first boot. If they fail, the instance keeps running, but your app never starts. AWS does not show these errors in the standard console view.

How to verify execution:

SSH into the instance using the Key Pair you selected in the Launch Template.
ssh -i "my-key.pem" ubuntu@<public-ip>
Check the logs. Cloud-init (the system that runs User Data) logs everything to specific files.
View the output log (stdout/stderr):
Terminal
bash
```
cat /var/log/cloud-init-output.log
```
View the detailed operational log:
Terminal
bash
```
cat /var/log/cloud-init.log
```

You can see all logs of your user data script, and it will also create the /etc/cloud folder.

How to fix it?

Found the error? Great! Now we need to update the User Data script.

Go to EC2 → Launch Templates.
Select your template → Actions → Modify template (Create new version).
Scroll down to Advanced details → User Data.
Correct your script (e.g., fix the typo) and click Create template version.

Next: Apply the Fix

Creating a new template version doesn't automatically update running instances. You must tell the ASG to use the new version.

1. Update ASG Configuration

Go to Auto Scaling Groups and select your group.
Click Edit on the Launch Template section.
Change the Version to Latest (so it starts using your fix).
Click Update.

2. Refresh Instances

The old instances are still running the broken code. You need to replace them.

Option A: Manual Termination

Go to Instance Management tab, select the old instance, and click Terminate. The ASG will notice the count dropped and automatically launch a new one (using the new version).

Option B: Capacity Toggle (Easiest)

Edit ASG → Set Desired Capacity to 0.
Wait for the instance to disappear.
Edit ASG → Set Desired Capacity back to 1.
ASG launches a fresh instance.

Achieving True Auto Scaling

From Manual to Automatic

Right now, our ASG has a static Desired Capacity. If traffic spikes, we have to manually login and increase it. That's not "Auto" Scaling. Let's automate this based on CPU Usage.

Step 4: Configure Dynamic Scaling Policies

1. Access Policy Settings

Go to Auto Scaling Groups→ Select your ASG.
Click on the Automatic scaling tab.
Scroll down to Dynamic scaling policies.
Click Create dynamic scaling policy.

2. Configure Target Tracking

AWS recommends Target Tracking Scaling. It works like a thermostat: you set a target (e.g., 50% CPU), and AWS adjusts the capacity to maintain that level.

Policy Type

Target tracking scaling

Metric Type

Average CPU utilization

Target Value

50(Keep CPU at 50%)

If CPU > 50%:ASG adds instances (Scale Out).

If CPU < 50%:ASG removes instances (Scale In).

3. Verify It Works (Stress Test)

To see the magic happen, we need to artificially increase CPU load on our instance.

SSH into your running instance.
Install the `stress` tool:
Terminal
bash
```
sudo apt-get install stress -y
```
Run a stress test (simulate high load):
Terminal
bash
```
stress --cpu 4 --timeout 300
```
* This forces 4 CPU cores to work at 100% for 5 minutes.
Watch the Console:
- Go to the Monitoring tab in ASG.
- You will see CPU Utilization spike.
- After a few minutes, check Activity History. You will see a new entry: "Launching a new EC2 instance to increase capacity."