Shipping code to production should be boring. Not scary, not manual, not error-prone - just boring. After breaking production one too many times with manual deployments, I built a CI/CD pipeline that deploys safely and automatically.
This is the complete guide to building a production-ready pipeline that I wish I had when starting out.
The Goal: What We're Building
Our pipeline will:
- ✅ Run automated tests on every pull request
- ✅ Build and push Docker images
- ✅ Deploy to staging environment automatically
- ✅ Deploy to production with manual approval
- ✅ Perform health checks and rollback if needed
- ✅ Send notifications to Slack
- ✅ Complete in under 10 minutes
Architecture Overview
GitHub Push → GitHub Actions → Docker Build → Container Registry
↓
Run Tests (Jest, E2E)
↓
Deploy to Staging
↓
Manual Approval
↓
Deploy to Production (Rolling update)
↓
Health Checks → Rollback if failed
Project Structure
.
├── .github/
│ └── workflows/
│ ├── ci.yml # Pull request checks
│ ├── deploy-staging.yml # Auto-deploy to staging
│ └── deploy-prod.yml # Manual production deploy
├── Dockerfile
├── docker-compose.yml
├── k8s/
│ ├── deployment.yml
│ ├── service.yml
│ └── ingress.yml
└── scripts/
├── health-check.sh
└── rollback.sh
Step 1: Dockerizing the Application
Multi-stage Dockerfile for Optimal Size
# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
# Copy package files
COPY package*.json ./
COPY pnpm-lock.yaml ./
# Install dependencies
RUN npm install -g pnpm && pnpm install --frozen-lockfile
# Copy source code
COPY . .
# Build application
RUN pnpm run build
# Production stage
FROM node:20-alpine AS runner
WORKDIR /app
# Set environment to production
ENV NODE_ENV=production
# Create non-root user for security
RUN addgroup --system --gid 1001 nodejs && \
adduser --system --uid 1001 nextjs
# Copy built application
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
COPY --from=builder --chown=nextjs:nodejs /app/public ./public
USER nextjs
EXPOSE 3000
ENV PORT=3000
ENV HOSTNAME="0.0.0.0"
CMD ["node", "server.js"]
Key optimizations:
- Multi-stage build reduces image from 1.2GB → 180MB
- Non-root user improves security
- Frozen lockfile ensures reproducible builds
- Standalone output for Next.js minimizes dependencies
Docker Compose for Local Testing
version: '3.8'
services:
app:
build:
context: .
dockerfile: Dockerfile
ports:
- "3000:3000"
environment:
- DATABASE_URL=postgresql://user:pass@db:5432/myapp
- REDIS_URL=redis://redis:6379
depends_on:
db:
condition: service_healthy
redis:
condition: service_started
db:
image: postgres:16-alpine
environment:
POSTGRES_USER: user
POSTGRES_PASSWORD: pass
POSTGRES_DB: myapp
volumes:
- postgres-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user"]
interval: 10s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
volumes:
- redis-data:/data
volumes:
postgres-data:
redis-data:
Step 2: CI Pipeline for Pull Requests
.github/workflows/ci.yml
name: CI
on:
pull_request:
branches: [main, develop]
push:
branches: [main, develop]
env:
NODE_VERSION: '20'
jobs:
test:
name: Test & Lint
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16-alpine
env:
POSTGRES_USER: test_user
POSTGRES_PASSWORD: test_pass
POSTGRES_DB: test_db
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 5432:5432
redis:
image: redis:7-alpine
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 6379:6379
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'pnpm'
- name: Install pnpm
run: npm install -g pnpm
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Run linter
run: pnpm run lint
- name: Run type checking
run: pnpm run type-check
- name: Run unit tests
run: pnpm run test:unit
env:
DATABASE_URL: postgresql://test_user:test_pass@localhost:5432/test_db
REDIS_URL: redis://localhost:6379
- name: Run integration tests
run: pnpm run test:integration
env:
DATABASE_URL: postgresql://test_user:test_pass@localhost:5432/test_db
REDIS_URL: redis://localhost:6379
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: ./coverage/coverage-final.json
flags: unittests
e2e:
name: E2E Tests
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
- name: Install pnpm
run: npm install -g pnpm
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Install Playwright browsers
run: pnpm exec playwright install --with-deps
- name: Build application
run: pnpm run build
- name: Run E2E tests
run: pnpm run test:e2e
- name: Upload test results
if: always()
uses: actions/upload-artifact@v3
with:
name: playwright-report
path: playwright-report/
security:
name: Security Scan
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
scan-type: 'fs'
scan-ref: '.'
format: 'sarif'
output: 'trivy-results.sarif'
- name: Upload Trivy results to GitHub Security
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: 'trivy-results.sarif'
- name: Check for npm vulnerabilities
run: npm audit --audit-level=high
Step 3: Staging Deployment
.github/workflows/deploy-staging.yml
name: Deploy to Staging
on:
push:
branches: [develop]
workflow_dispatch:
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
build-and-push:
name: Build & Push Docker Image
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
outputs:
image-tag: ${{ steps.meta.outputs.tags }}
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch
type=sha,prefix={{branch}}-
type=raw,value=staging-latest
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
build-args: |
BUILD_DATE=${{ github.event.head_commit.timestamp }}
VCS_REF=${{ github.sha }}
deploy:
name: Deploy to Staging Cluster
needs: build-and-push
runs-on: ubuntu-latest
environment:
name: staging
url: https://staging.myapp.com
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Install kubectl
uses: azure/setup-kubectl@v3
- name: Configure kubectl
run: |
echo "${{ secrets.KUBE_CONFIG_STAGING }}" | base64 -d > kubeconfig
echo "KUBECONFIG=$(pwd)/kubeconfig" >> $GITHUB_ENV
- name: Update deployment image
run: |
kubectl set image deployment/myapp-staging \
myapp=${{ needs.build-and-push.outputs.image-tag }} \
-n staging
- name: Wait for rollout
run: |
kubectl rollout status deployment/myapp-staging \
-n staging \
--timeout=5m
- name: Run health checks
run: |
chmod +x ./scripts/health-check.sh
./scripts/health-check.sh https://staging.myapp.com
- name: Notify Slack
if: always()
uses: slackapi/slack-github-action@v1
with:
webhook-url: ${{ secrets.SLACK_WEBHOOK }}
payload: |
{
"text": "Staging Deployment ${{ job.status }}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Staging Deployment*\n*Status:* ${{ job.status }}\n*Commit:* ${{ github.sha }}\n*URL:* https://staging.myapp.com"
}
}
]
}
Step 4: Production Deployment with Manual Approval
.github/workflows/deploy-prod.yml
name: Deploy to Production
on:
push:
branches: [main]
workflow_dispatch:
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
build-and-push:
name: Build & Push Production Image
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
outputs:
image-tag: ${{ steps.meta.outputs.tags }}
image-digest: ${{ steps.build.outputs.digest }}
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=sha,prefix=prod-
type=raw,value=prod-latest
- name: Build and push
id: build
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
deploy:
name: Deploy to Production
needs: build-and-push
runs-on: ubuntu-latest
environment:
name: production
url: https://myapp.com
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Install kubectl
uses: azure/setup-kubectl@v3
- name: Configure kubectl
run: |
echo "${{ secrets.KUBE_CONFIG_PROD }}" | base64 -d > kubeconfig
echo "KUBECONFIG=$(pwd)/kubeconfig" >> $GITHUB_ENV
- name: Create backup of current deployment
run: |
kubectl get deployment myapp-prod -n production -o yaml > backup-deployment.yaml
- name: Deploy with rolling update
run: |
kubectl set image deployment/myapp-prod \
myapp=${{ needs.build-and-push.outputs.image-tag }} \
-n production
- name: Wait for rollout
id: rollout
run: |
kubectl rollout status deployment/myapp-prod \
-n production \
--timeout=10m
- name: Run health checks
id: health
run: |
chmod +x ./scripts/health-check.sh
./scripts/health-check.sh https://myapp.com
- name: Rollback on failure
if: failure()
run: |
echo "Deployment failed, rolling back..."
kubectl rollout undo deployment/myapp-prod -n production
kubectl rollout status deployment/myapp-prod -n production
- name: Notify Slack - Success
if: success()
uses: slackapi/slack-github-action@v1
with:
webhook-url: ${{ secrets.SLACK_WEBHOOK }}
payload: |
{
"text": "✅ Production Deployment Successful",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Production Deployment Successful* ✅\n*Commit:* `${{ github.sha }}`\n*Image Digest:* `${{ needs.build-and-push.outputs.image-digest }}`\n*URL:* https://myapp.com"
}
}
]
}
- name: Notify Slack - Failure
if: failure()
uses: slackapi/slack-github-action@v1
with:
webhook-url: ${{ secrets.SLACK_WEBHOOK }}
payload: |
{
"text": "❌ Production Deployment Failed - Rolled Back",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Production Deployment Failed* ❌\n*Status:* Rolled back to previous version\n*Commit:* `${{ github.sha }}`\n*Logs:* https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}"
}
}
]
}
Step 5: Kubernetes Deployment Configuration
k8s/deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-prod
namespace: production
labels:
app: myapp
environment: production
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # Zero-downtime deployment
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
environment: production
spec:
containers:
- name: myapp
image: ghcr.io/username/myapp:prod-latest
ports:
- containerPort: 3000
name: http
env:
- name: NODE_ENV
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: myapp-secrets
key: database-url
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: myapp-secrets
key: redis-url
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /api/ready
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
Step 6: Health Check Script
scripts/health-check.sh
#!/bin/bash
URL=$1
MAX_RETRIES=10
RETRY_DELAY=5
echo "Running health checks on $URL"
for i in $(seq 1 $MAX_RETRIES); do
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$URL/api/health")
if [ "$HTTP_STATUS" -eq 200 ]; then
echo "✅ Health check passed (attempt $i/$MAX_RETRIES)"
# Additional checks
RESPONSE_TIME=$(curl -s -o /dev/null -w "%{time_total}" "$URL")
echo "Response time: ${RESPONSE_TIME}s"
if (( $(echo "$RESPONSE_TIME < 2.0" | bc -l) )); then
echo "✅ Response time is acceptable"
exit 0
else
echo "⚠️ Response time is slow but acceptable"
exit 0
fi
else
echo "❌ Health check failed with status $HTTP_STATUS (attempt $i/$MAX_RETRIES)"
if [ "$i" -lt "$MAX_RETRIES" ]; then
echo "Retrying in ${RETRY_DELAY}s..."
sleep $RETRY_DELAY
fi
fi
done
echo "❌ Health checks failed after $MAX_RETRIES attempts"
exit 1
Advanced Features
1. Canary Deployments
# k8s/canary-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-canary
spec:
replicas: 1 # Only 10% of traffic
template:
metadata:
labels:
app: myapp
version: canary
2. Database Migrations
# Add to deploy job
- name: Run database migrations
run: |
kubectl run migration-${{ github.sha }} \
--image=${{ needs.build-and-push.outputs.image-tag }} \
--restart=Never \
--namespace=production \
--command -- npm run migrate
kubectl wait --for=condition=complete job/migration-${{ github.sha }} \
--namespace=production \
--timeout=5m
3. Automated Rollback on Error Rate Spike
- name: Monitor error rate
run: |
ERROR_RATE=$(curl -s "https://api.sentry.io/..." | jq '.rate')
if (( $(echo "$ERROR_RATE > 5.0" | bc -l) )); then
echo "Error rate spike detected! Rolling back..."
kubectl rollout undo deployment/myapp-prod -n production
exit 1
fi
Results
After implementing this pipeline:
- ✅ Zero production incidents in 6 months
- ✅ 15+ deployments per week (up from 2)
- ✅ 8-minute average deployment time
- ✅ 100% deployment success rate (rollbacks work!)
- ✅ Sub-second rollback time if issues detected
Key Takeaways
- Automate everything - Manual steps = opportunities for errors
- Test before deploying - Catch issues in CI, not production
- Always have a rollback plan - Things will go wrong
- Monitor deployments - Health checks + error tracking
- Use staging environments - Production-like testing
- Immutable infrastructure - Containers ensure consistency
- Zero-downtime deployments - Rolling updates with health checks
Common Pitfalls to Avoid
❌ Skipping staging environments - Always test before production ❌ No rollback strategy - Hope is not a strategy ❌ Manual approval fatigue - Only require approval for prod ❌ Long deployment times - Optimize Docker builds with caching ❌ Ignoring failed health checks - Automated rollbacks save the day
This pipeline has saved me countless hours and prevented numerous production incidents. The initial setup takes time, but the confidence and speed it provides are worth every minute.
Full working example with all configs: GitHub Repository
Tech Stack: GitHub Actions, Docker, Kubernetes, PostgreSQL, Redis, Slack, Sentry