Deployment Troubleshooting Guide
Quick Diagnosis Table
Use this table to quickly identify the most likely cause based on symptoms:
| Symptom | Most Likely Cause | Quick Check | Section |
| -------------------------- | ---------------------------------- | -------------------------------- | ------------------ |
| Container won't start | Environment vars, port conflict | docker logs blueline-alpha-api | Container Issues |
| Health check returns 502 | nginx down, container crashed | systemctl status nginx | Network Issues |
| Health check returns 503 | API starting/unhealthy | docker ps (check STATUS) | Container Issues |
| Database connection failed | Wrong DB host, password, port | Check env vars | Database Issues |
| Slow API responses | Memory leak, unoptimized queries | docker stats | Performance Issues |
| Build fails on VPS | Wrong architecture (ARM vs AMD64) | Check build platform | Build Issues |
| Cache miss on every build | Layer order changed, .dockerignore | Run cache analytics | Build Issues |
Troubleshooting Decision Tree
DEPLOYMENT ISSUE
│
┌────────────┴────────────┐
│ │
What's Broken? When Did It Break?
│ │
┌───────────┼───────────┐ ┌───┴───┐
│ │ │ │ │
Container API/Web Database During After
Won't Responds Issue Build Deploy
Start Slowly │ │
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌───────────────────────────────────────────────────────────────────────────┐
│ CONTAINER WON'T START │
├───────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. Check Logs: docker logs blueline-alpha-api --tail 50 │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Error Contains │ Solution │ │
│ ├───────────────────────────────────┼─────────────────────────┤ │
│ │ "JWT_SECRET" │ → Env Vars (Section A) │ │
│ │ "DATABASE_URL" │ → Env Vars (Section A) │ │
│ │ "EADDRINUSE" / "port already" │ → Port Conflict (B) │ │
│ │ "Migration failed" / "database" │ → Database Issues │ │
│ │ "ECONNREFUSED" │ → DB Not Running │ │
│ └───────────────────────────────────┴─────────────────────────┘ │
│ │
│ A. Env Vars Missing │
│ → docker exec blueline-alpha-api env | grep -E 'JWT|DATABASE' │
│ → Fix: Update .env.blueline-alpha, rebuild, redeploy │
│ │
│ B. Port Conflict │
│ → lsof -i :3003 (check what's using the port) │
│ → Fix: Stop conflicting service or change port in compose │
│ │
└───────────────────────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────────┐
│ SLOW RESPONSES / PERFORMANCE │
├───────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. Check Resources: docker stats --no-stream blueline-alpha-api │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Metric │ Threshold │ Action If Exceeded │ │
│ ├─────────────────┼────────────┼──────────────────────────────┤ │
│ │ CPU Usage │ >80% │ → Check for tight loops │ │
│ │ Memory Usage │ >90% │ → Memory Leak (restart) │ │
│ │ Network I/O │ >100MB/s │ → Check for excessive logs │ │
│ └─────────────────┴────────────┴──────────────────────────────┘ │
│ │
│ Memory >90%? │
│ ├─ Yes → RESTART CONTAINER (memory leak) │
│ │ docker compose restart api │
│ │ Monitor: Watch memory over next hour │
│ │ │
│ └─ No → CHECK DATABASE │
│ ├─ Slow query logs enabled? │
│ ├─ Missing indexes? (check query plans) │
│ └─ Too many connections? (check connection pool) │
│ │
└───────────────────────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────────┐
│ HEALTH CHECK FAILS │
├───────────────────────────────────────────────────────────────────────────┤
│ │
│ Response Code? │
│ │ │
│ ├─ 502 (Bad Gateway) │
│ │ ├─ nginx running? → systemctl status nginx │
│ │ ├─ Container crashed? → docker ps (check STATUS) │
│ │ └─ Wrong upstream? → nginx config (/etc/nginx/sites-enabled/) │
│ │ │
│ ├─ 503 (Service Unavailable) │
│ │ ├─ Container starting? → docker logs (wait 30s) │
│ │ ├─ Health endpoint broken? → curl localhost:3003/health │
│ │ └─ Database down? → Check DB container │
│ │ │
│ ├─ 504 (Gateway Timeout) │
│ │ ├─ Slow startup? → Check migration logs │
│ │ ├─ Database locked? → Check active queries │
│ │ └─ Memory issue? → docker stats │
│ │ │
│ └─ Connection Refused │
│ ├─ Container not running? → docker ps -a │
│ ├─ Wrong port? → Check compose file (3003 for blueline) │
│ └─ Firewall? → iptables -L (check rules) │
│ │
└───────────────────────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────────┐
│ BUILD FAILURES │
├───────────────────────────────────────────────────────────────────────────┤
│ │
│ Where Did Build Fail? │
│ │ │
│ ├─ Environment Validation │
│ │ → .env file missing or incomplete │
│ │ → Run: ./scripts/validate-deployment-env.sh │
│ │ │
│ ├─ TypeScript Type-Check │
│ │ → Fix type errors in code │
│ │ → Run: pnpm type-check │
│ │ │
│ ├─ Docker Build │
│ │ ├─ exec format error → Wrong architecture (use --amd64) │
│ │ ├─ COPY failed → Missing files (check .dockerignore) │
│ │ └─ Out of memory → Increase Docker resources │
│ │ │
│ ├─ Container Smoke Tests │
│ │ ├─ Container won't start → Check Dockerfile entrypoint │
│ │ ├─ Health endpoint timeout → Increase startup wait time │
│ │ └─ Port binding failed → Change test port │
│ │ │
│ └─ Image Transfer │
│ ├─ SSH timeout → Check VPS accessibility │
│ ├─ Disk full on VPS → Clean old images │
│ └─ Network error → Retry transfer │
│ │
└───────────────────────────────────────────────────────────────────────────┘
EMERGENCY ROLLBACK (when all else fails):
./scripts/rollback-deployment.sh root@23.235.204.208 previous
↓ Rolls back to last known good version in <5 minutes
↓ Database state preserved (no data loss)
↓ Automatic health verification after rollback
Diagnostic Commands Quick Reference
Check Container Status
# All containers for deployment
ssh root@23.235.204.208 'docker ps -a --filter "name=blueline-alpha"'
# Detailed status (health, ports, uptime)
ssh root@23.235.204.208 'docker ps --filter "name=blueline-alpha" --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"'
# Resource usage (CPU, memory, network)
ssh root@23.235.204.208 'docker stats --no-stream blueline-alpha-api blueline-alpha-web blueline-alpha-db'
View Logs
# API logs (last 50 lines)
ssh root@23.235.204.208 'docker logs blueline-alpha-api --tail 50'
# Web logs (last 50 lines)
ssh root@23.235.204.208 'docker logs blueline-alpha-web --tail 50'
# Follow logs in real-time
ssh root@23.235.204.208 'docker logs -f blueline-alpha-api'
# Search logs for errors
ssh root@23.235.204.208 'docker logs blueline-alpha-api 2>&1 | grep -i error | tail -20'
Test Connectivity
# Test API health from VPS
ssh root@23.235.204.208 'curl -s http://localhost:3003/health | jq'
# Test Web health from VPS
ssh root@23.235.204.208 'curl -s http://localhost:3006/ | head -20'
# Test public HTTPS
curl -s https://alpha.theblueline.com/health | jq
Container Issues
Container Won't Start
Symptoms:
docker psshows container with statusExited (1)- Container restarts immediately after starting
- No container appears in
docker psoutput
Diagnostic:
# Check exit code and logs
ssh root@23.235.204.208 'docker ps -a --filter "name=blueline-alpha-api" --format "{{.Status}}"'
ssh root@23.235.204.208 'docker logs blueline-alpha-api --tail 100'
Common Causes:
A. Missing Environment Variables
# Check .env file exists
ssh root@23.235.204.208 'ls -lh /opt/sampo-alpha/.env.blueline-alpha'
# Verify required variables
ssh root@23.235.204.208 'grep "DATABASE_URL\|JWT_SECRET\|REDIS_URL" /opt/sampo-alpha/.env.blueline-alpha | sed "s/:.*@/:***@/g"'
# Solution: Add missing variable and restart
echo "MISSING_VAR=value" >> .env.blueline-alpha
scp .env.blueline-alpha root@23.235.204.208:/opt/sampo-alpha/
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart api'
B. Database Connection Failed
# Check database is running
ssh root@23.235.204.208 'docker ps --filter "name=blueline-alpha-db"'
# Test connection from API container
ssh root@23.235.204.208 'docker exec blueline-alpha-api nc -zv blueline-alpha-db 5432'
# Solution: Restart database first, then API
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart db'
sleep 10
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart api'
C. Port Conflict
# Check what's using port 3003
ssh root@23.235.204.208 'lsof -i :3003 | head -10'
# Solution: Kill conflicting process or change port in docker-compose
Container Crashes After Startup
Symptoms:
- Container starts, runs for seconds/minutes, then exits
- Status shows restart loop:
Restarting (1) 3 seconds ago - Logs show runtime errors
Diagnostic:
# Check restart count
ssh root@23.235.204.208 'docker inspect blueline-alpha-api | jq ".[0].RestartCount"'
# Check for OOM (out of memory)
ssh root@23.235.204.208 'docker inspect blueline-alpha-api | jq ".[0].State.OOMKilled"'
# Check memory usage
ssh root@23.235.204.208 'docker stats --no-stream blueline-alpha-api'
Common Causes:
A. Out of Memory (OOM)
# Check system logs for OOM
ssh root@23.235.204.208 'dmesg | grep -i "out of memory\|oom" | tail -10'
# Solution: Increase memory limit in docker-compose.blueline-alpha.override.yml
# deploy:
# resources:
# limits:
# memory: 2G # Increase from 1G
# Restart with new limits
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml up -d api'
B. Unhandled Exception
# Check logs for stack trace
ssh root@23.235.204.208 'docker logs blueline-alpha-api 2>&1 | grep -i "error\|exception\|fatal" | tail -20'
# Solution: Fix code issue, redeploy
Database Issues
Connection Failures
Symptoms:
- API logs:
Error: connect ECONNREFUSED - Health check fails with database error
- Prisma client throws connection timeout
Diagnostic:
# Check database container
ssh root@23.235.204.208 'docker ps --filter "name=blueline-alpha-db"'
# Check database logs
ssh root@23.235.204.208 'docker logs blueline-alpha-db --tail 30'
# Test connection from host
ssh root@23.235.204.208 'psql -h localhost -p 5436 -U postgres -d sampo_blueline_alpha -c "SELECT 1;"'
Solution:
# Restart database container
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml up -d db'
# Wait for database to be ready
sleep 10
Migration Errors
Symptoms:
- API startup logs show migration failures
- Database schema out of sync with code
- Prisma errors: "Column does not exist"
Diagnostic:
# Check migration status
ssh root@23.235.204.208 'docker exec blueline-alpha-api pnpm prisma migrate status'
# Check applied migrations
ssh root@23.235.204.208 'docker exec blueline-alpha-db psql -U postgres -d sampo_blueline_alpha -c "SELECT migration_name, finished_at FROM _prisma_migrations ORDER BY finished_at DESC LIMIT 10;"'
Solution:
# Apply pending migrations
ssh root@23.235.204.208 'docker exec blueline-alpha-api pnpm prisma migrate deploy'
# If migration failed partially, mark as rolled back
ssh root@23.235.204.208 'docker exec blueline-alpha-db psql -U postgres -d sampo_blueline_alpha -c "UPDATE _prisma_migrations SET rolled_back_at = NOW() WHERE finished_at IS NULL;"'
# Re-apply migrations
ssh root@23.235.204.208 'docker exec blueline-alpha-api pnpm prisma migrate deploy'
Performance Degradation
Symptoms:
- Queries taking >1 second
- API response times increasing
- High database CPU usage
Diagnostic:
# Check active connections
ssh root@23.235.204.208 'docker exec blueline-alpha-db psql -U postgres -d sampo_blueline_alpha -c "SELECT count(*) FROM pg_stat_activity WHERE state = '\''active'\'';"'
# Check slow queries
ssh root@23.235.204.208 'docker exec blueline-alpha-db psql -U postgres -d sampo_blueline_alpha -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = '\''active'\'' ORDER BY duration DESC LIMIT 5;"'
Solution:
# Reduce Prisma connection pool
# In DATABASE_URL: postgresql://...?connection_limit=5
# Vacuum tables
ssh root@23.235.204.208 'docker exec blueline-alpha-db psql -U postgres -d sampo_blueline_alpha -c "VACUUM ANALYZE;"'
Network Issues
nginx Configuration
Symptoms:
- 502 Bad Gateway on public URL
- nginx logs show "upstream" errors
- Health checks work locally but fail via HTTPS
Diagnostic:
# Check nginx status
ssh root@23.235.204.208 'systemctl status nginx --no-pager'
# Check nginx configuration
ssh root@23.235.204.208 'nginx -t'
# Check nginx error logs
ssh root@23.235.204.208 'tail -50 /var/log/nginx/error.log'
Common Causes:
A. nginx Not Running
# Solution
ssh root@23.235.204.208 'systemctl start nginx && systemctl enable nginx'
B. Apache Blocking Ports
# Check what's using ports 80 and 443
ssh root@23.235.204.208 'lsof -i :80 -i :443 | head -20'
# If Apache/httpd is listed, stop it
ssh root@23.235.204.208 'systemctl stop httpd && systemctl disable httpd'
ssh root@23.235.204.208 'systemctl restart nginx'
SSL Certificate Issues
Symptoms:
- Browser shows "Your connection is not private"
- Certificate expired warnings
curlfails with SSL error
Diagnostic:
# Check certificate status
curl -vI https://alpha.theblueline.com 2>&1 | grep -i "expire\|subject\|issuer"
# Check certificate files exist
ssh root@23.235.204.208 'ls -lh /etc/letsencrypt/live/alpha.theblueline.com/'
Solution:
# Renew certificate
ssh root@23.235.204.208 'certbot renew --nginx --force-renewal'
ssh root@23.235.204.208 'systemctl reload nginx'
Performance Issues
High Memory Usage
Symptoms:
- Docker stats show >80% memory usage
- Container killed by OOM
- API responses slow/timeout
Diagnostic:
# Check current memory usage
ssh root@23.235.204.208 'docker stats --no-stream blueline-alpha-api blueline-alpha-web'
# Check OOM kills
ssh root@23.235.204.208 'dmesg | grep -i "out of memory" | tail -10'
Solution:
# Temporary: Restart container
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart api'
# Permanent: Increase memory limit in docker-compose
# deploy:
# resources:
# limits:
# memory: 2G
Slow Responses
Symptoms:
- API requests taking >3 seconds
- Frontend loading slowly
- Timeout errors in logs
Diagnostic:
# Check API response times
curl -o /dev/null -s -w "Time: %{time_total}s\n" https://alpha.theblueline.com/health
# Check CPU usage
ssh root@23.235.204.208 'docker stats --no-stream blueline-alpha-api | awk "{print \$3}"'
Solution:
# Check for slow database queries
ssh root@23.235.204.208 'docker logs blueline-alpha-api 2>&1 | grep "Query took"'
# Implement caching, add indexes, optimize queries
Cache Degradation
Symptoms:
- Build times increasing over time
- Cache hit rate dropping below 50%
- Every build takes 10+ minutes
Diagnostic:
# Check cache performance
./scripts/analyze-build-cache.sh
# Check detailed build stats
./scripts/analyze-build-cache.sh --detailed | tail -50
# Check trends
./scripts/analyze-build-cache.sh --trends
Solution:
See Build Performance Optimization article for detailed troubleshooting.
Build Issues
Build Fails Locally
Symptoms:
docker buildfails with compilation error- TypeScript errors during build
- Out of memory during build
Diagnostic:
# Check build logs
./scripts/docker-audit/build-cross-platform.sh --api --local 2>&1 | tee build-error.log
# Check disk space
df -h
# Check Docker resources
docker system df
Common Causes:
A. TypeScript Errors
# Run local validation
pnpm type-check
# Fix type errors before building
B. Insufficient Docker Resources
# Solution: Increase Docker Desktop memory allocation
# Docker Desktop → Settings → Resources → Memory → 8GB
Build Succeeds But Container Fails
Symptoms:
- Image builds successfully
- Container exits immediately with code 1
- Logs show "exec format error"
Diagnostic:
# Check image architecture
docker inspect sampo-blueline-alpha-api:latest | jq '.[0].Architecture'
# Check host architecture
uname -m
# amd64 for VPS, arm64 for M1 Mac
Solution:
# Rebuild for correct architecture
./scripts/docker-audit/build-cross-platform.sh --api --amd64 # For VPS
./scripts/docker-audit/build-cross-platform.sh --api --local # For local Mac
Recovery Procedures
Safe Restart
Use when: Minor issues, no data loss risk
# Restart in order: db → api → web
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart db'
sleep 10
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart api'
sleep 5
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart web'
# Verify health
./scripts/deployment-smoke-test.sh root@23.235.204.208 https://alpha.theblueline.com
Emergency Rollback
Use when: New deployment broke production, need immediate revert
# Rollback to previous version
./scripts/rollback-deployment.sh root@23.235.204.208 previous
# Rollback to specific version
./scripts/rollback-deployment.sh root@23.235.204.208 20260208-143022-a4c6788
# Verify rollback
./scripts/deployment-smoke-test.sh root@23.235.204.208 https://alpha.theblueline.com
Escalation Guidelines
Severity Levels
| Level | Response Time | Action | | ----------------- | ------------- | --------------------------------------- | | P0 - Critical | Immediate | Production down, all users affected | | P1 - High | <15 minutes | Significant feature broken | | P2 - Medium | <1 hour | Minor feature broken, workaround exists | | P3 - Low | <1 day | Cosmetic issue, no user impact |
When to Escalate
P0 - Critical:
- API health check fails for >2 minutes
- Database unreachable
- All users unable to access system
P1 - High:
- Response times >5 seconds consistently
- Memory usage >90%
- Key feature completely broken
P2 - Medium:
- Cache hit rate <50% for multiple builds
- Disk usage >80%
- Non-critical feature degraded
P3 - Low:
- Slow build times (but completing)
- Minor performance issues
- Documentation updates needed
What to Provide
When escalating:
- Severity level (P0-P3)
- Symptoms observed
- Diagnostic commands run
- Error messages/logs
- What was tried
- Current system state
Related Articles
Core Documentation
- 📘 Deployment Overview - System features and standard workflow
- 🎓 Deployment Quick Start - Tutorial for new team members
- ⚡ Build Optimization - Cache performance and build tuning
Advanced Resources
- 📄 Full troubleshooting guide:
docs/operations/deployment-troubleshooting.md(800+ lines) - 🔧 Deployment runbook:
docs/operations/deployment-runbook.md - 📊 Quick reference:
docs/operations/docker-deployment-quick-reference.md
Emergency Contacts
For critical issues (P0):
- DevOps Team: [escalation process]
- On-Call Engineer: [contact info]
For questions (P1-P3):
- Check troubleshooting guide first
- DevOps Team: [support channel]