Mastering PM2 & CI/CD
The bridge between "it works on my laptop" and real production.
What is PM2?
PM2 is a production process manager. Think of it as a smart supervisor sitting between your operating system and your application.
It keeps your app alive forever, restarts it automatically if it crashes, handles logs, and allows for zero-downtime reloads.
The Real Problem
Without PM2
- • Terminal closed → App dies
- • SSH disconnected → App dies
- • App crashes → Dead forever
- • Server restarts → Never comes back
node app.jsWith PM2
- • Runs in background
- • Survives SSH close
- • Auto-restarts on crash
- • Auto-starts on OS reboot
pm2 start app.jsHow It Works Internally
This is the key concept. PM2 itself is a daemon (background service). When you start an app, PM2 forks a child process and keeps its PID (Process ID).
pm2 daemon
├── app-1 (node / uvicorn)
├── app-2
└── log managerPM2 constantly monitors these processes. It supervises them. If a process exits unexpectedly, PM2 restarts it immediately.
💡 Insight: When you close your terminal, the PM2 daemon keeps running, and because your app is a child of the daemon (not your shell), your app keeps running too.
Process Management Logic
Why does the app restart if you kill it manually?
kill -9 <pid>PM2 sees that the process exited unexpectedly. Its logic is simple:
→ RESTART IT
To actually stop an app, you must tell PM2:
pm2 stop app-nameNode.js & Python Support
Node.js (Express/Nest/Next)
Standard way to start a Node application:
pm2 start app.js --name node-apiPython (FastAPI/Django)
PM2 doesn't run Python directly; it runs commands. For FastAPI, it wraps uvicorn.
pm2 start "uvicorn main:app --host 0.0.0.0 --port 8000" \
--name fastapi-app \
--interpreter bashInternally: PM2 → Bash → Uvicorn → Python App
Surviving Server Reboots
This is where PM2 becomes production-grade. We need to tell the OS (systemd) to launch PM2 on boot.
Save running apps
pm2 saveDumps current process list to ~/.pm2/dump.pm2
Generate startup script
pm2 startupRun the command output by this script to register PM2 with systemd.
Scaling with Cluster Mode
Node.js is single-threaded. PM2 lets you utilize all CPU cores without changing your code.
pm2 start app.js -i maxThis automatically balances load across instances:
Continuous Integration (CI)
CI is the habit of automatically checking your code every time you push it. It is a safety net that saves developers hours of debugging locally.
What CI Actually Does:
- 1.Install Dependencies: It runs `npm install` in a clean environment to make sure you didn't forget to save a package.
- 2.Build Project: It runs `npm run build`. If you have a syntax error, it fails here.
- 3.Run Linters: Checks formatting. "Fail fast" if code is messy.
- 4.Run Tests: Executes unit tests. If logic is broken, it stops the pipeline.
* flow only reaches CD when everything passes in CI
Continuous Deployment (CD)
If CI says "Code is Safe", CD says "Ship it". It automates the boring manual work of SSH-ing into servers.
Typical CD Steps
- SSH into EC2 Instance
- Pull latest code (git pull)
- Install dependencies (npm i)
- Build Project (npm run build)
- Restart App (pm2 reload)
Delivery vs. Deployment
Automatic. Every push to main goes live immediately. Good for startups.
Manual Trigger. Code is *ready* to deploy, but a human clicks the button. Used by Big Tech.
Why 'Reload' Matters
Why is `pm2 reload` superior to `pm2 restart`?
Scenario: The "Crash Loop"
Imagine your CD pipeline passes (Build success). But your new code has a runtime error:
Error: Database Connection Failed (Wrong .env)
- Kills old app.
- Tries to start new app.
- New app crashes.
- Result: Website Down.
- Starts new app in background.
- New app crashes?
- Pm2 keeps Old App alive.
- Result: Zero Downtime.
Handling Logic Errors & Rollback
If some logic error occurs...
That's where Monitoring / Logs / Health Check / Rollback comes in.
Rollback will be effective
(simple but powerful)
They don't announce themselves.
- ▪️Production is the real ground. Everything may work perfectly on dev/stage, but real scenario comes out on production.
- ▪️So, all these steps are very important.
- 💡"Dev testing is about correctness.
Production testing is about damage control." - ▪️High Stakes: Trust, money, and responsibilities depend on it.
- • 2 - 5 User (You)
- • Local DB (0ms latency)
- • No Firewalls
- • Clean/Mock Data
- • 10k Users (Race conditions)
- • Cloud DB (Network Latency)
- • Real Security Rules
- • Messy Real Data