Deployment Troubleshooting Guide

Last updated: February 8, 2026
Admin Tools

Deployment Troubleshooting Guide

Quick Diagnosis Table

Use this table to quickly identify the most likely cause based on symptoms:

| Symptom | Most Likely Cause | Quick Check | Section | | -------------------------- | ---------------------------------- | -------------------------------- | ------------------ | | Container won't start | Environment vars, port conflict | docker logs blueline-alpha-api | Container Issues | | Health check returns 502 | nginx down, container crashed | systemctl status nginx | Network Issues | | Health check returns 503 | API starting/unhealthy | docker ps (check STATUS) | Container Issues | | Database connection failed | Wrong DB host, password, port | Check env vars | Database Issues | | Slow API responses | Memory leak, unoptimized queries | docker stats | Performance Issues | | Build fails on VPS | Wrong architecture (ARM vs AMD64) | Check build platform | Build Issues | | Cache miss on every build | Layer order changed, .dockerignore | Run cache analytics | Build Issues |

Troubleshooting Decision Tree

                          DEPLOYMENT ISSUE
                                 │
                    ┌────────────┴────────────┐
                    │                         │
             What's Broken?              When Did It Break?
                    │                         │
        ┌───────────┼───────────┐        ┌───┴───┐
        │           │           │        │       │
    Container     API/Web    Database   During  After
      Won't      Responds     Issue     Build  Deploy
      Start       Slowly                   │       │
        │           │           │          │       │
        ▼           ▼           ▼          ▼       ▼

┌───────────────────────────────────────────────────────────────────────────┐
│                     CONTAINER WON'T START                                 │
├───────────────────────────────────────────────────────────────────────────┤
│                                                                           │
│  1. Check Logs: docker logs blueline-alpha-api --tail 50                 │
│                                                                           │
│  ┌─────────────────────────────────────────────────────────────┐         │
│  │ Error Contains                    │ Solution                │         │
│  ├───────────────────────────────────┼─────────────────────────┤         │
│  │ "JWT_SECRET"                      │ → Env Vars (Section A)  │         │
│  │ "DATABASE_URL"                    │ → Env Vars (Section A)  │         │
│  │ "EADDRINUSE" / "port already"     │ → Port Conflict (B)     │         │
│  │ "Migration failed" / "database"   │ → Database Issues       │         │
│  │ "ECONNREFUSED"                    │ → DB Not Running        │         │
│  └───────────────────────────────────┴─────────────────────────┘         │
│                                                                           │
│  A. Env Vars Missing                                                     │
│     → docker exec blueline-alpha-api env | grep -E 'JWT|DATABASE'       │
│     → Fix: Update .env.blueline-alpha, rebuild, redeploy                │
│                                                                           │
│  B. Port Conflict                                                        │
│     → lsof -i :3003 (check what's using the port)                       │
│     → Fix: Stop conflicting service or change port in compose           │
│                                                                           │
└───────────────────────────────────────────────────────────────────────────┘

┌───────────────────────────────────────────────────────────────────────────┐
│                    SLOW RESPONSES / PERFORMANCE                           │
├───────────────────────────────────────────────────────────────────────────┤
│                                                                           │
│  1. Check Resources: docker stats --no-stream blueline-alpha-api         │
│                                                                           │
│  ┌─────────────────────────────────────────────────────────────┐         │
│  │ Metric          │ Threshold  │ Action If Exceeded           │         │
│  ├─────────────────┼────────────┼──────────────────────────────┤         │
│  │ CPU Usage       │ >80%       │ → Check for tight loops      │         │
│  │ Memory Usage    │ >90%       │ → Memory Leak (restart)      │         │
│  │ Network I/O     │ >100MB/s   │ → Check for excessive logs   │         │
│  └─────────────────┴────────────┴──────────────────────────────┘         │
│                                                                           │
│  Memory >90%?                                                            │
│    ├─ Yes → RESTART CONTAINER (memory leak)                             │
│    │         docker compose restart api                                  │
│    │         Monitor: Watch memory over next hour                        │
│    │                                                                      │
│    └─ No → CHECK DATABASE                                               │
│              ├─ Slow query logs enabled?                                 │
│              ├─ Missing indexes? (check query plans)                     │
│              └─ Too many connections? (check connection pool)            │
│                                                                           │
└───────────────────────────────────────────────────────────────────────────┘

┌───────────────────────────────────────────────────────────────────────────┐
│                       HEALTH CHECK FAILS                                  │
├───────────────────────────────────────────────────────────────────────────┤
│                                                                           │
│  Response Code?                                                          │
│    │                                                                      │
│    ├─ 502 (Bad Gateway)                                                 │
│    │    ├─ nginx running? → systemctl status nginx                      │
│    │    ├─ Container crashed? → docker ps (check STATUS)                │
│    │    └─ Wrong upstream? → nginx config (/etc/nginx/sites-enabled/)   │
│    │                                                                      │
│    ├─ 503 (Service Unavailable)                                         │
│    │    ├─ Container starting? → docker logs (wait 30s)                 │
│    │    ├─ Health endpoint broken? → curl localhost:3003/health         │
│    │    └─ Database down? → Check DB container                          │
│    │                                                                      │
│    ├─ 504 (Gateway Timeout)                                             │
│    │    ├─ Slow startup? → Check migration logs                         │
│    │    ├─ Database locked? → Check active queries                      │
│    │    └─ Memory issue? → docker stats                                 │
│    │                                                                      │
│    └─ Connection Refused                                                 │
│         ├─ Container not running? → docker ps -a                         │
│         ├─ Wrong port? → Check compose file (3003 for blueline)         │
│         └─ Firewall? → iptables -L (check rules)                        │
│                                                                           │
└───────────────────────────────────────────────────────────────────────────┘

┌───────────────────────────────────────────────────────────────────────────┐
│                         BUILD FAILURES                                    │
├───────────────────────────────────────────────────────────────────────────┤
│                                                                           │
│  Where Did Build Fail?                                                   │
│    │                                                                      │
│    ├─ Environment Validation                                             │
│    │    → .env file missing or incomplete                                │
│    │    → Run: ./scripts/validate-deployment-env.sh                      │
│    │                                                                      │
│    ├─ TypeScript Type-Check                                              │
│    │    → Fix type errors in code                                        │
│    │    → Run: pnpm type-check                                           │
│    │                                                                      │
│    ├─ Docker Build                                                       │
│    │    ├─ exec format error → Wrong architecture (use --amd64)         │
│    │    ├─ COPY failed → Missing files (check .dockerignore)            │
│    │    └─ Out of memory → Increase Docker resources                    │
│    │                                                                      │
│    ├─ Container Smoke Tests                                              │
│    │    ├─ Container won't start → Check Dockerfile entrypoint          │
│    │    ├─ Health endpoint timeout → Increase startup wait time         │
│    │    └─ Port binding failed → Change test port                       │
│    │                                                                      │
│    └─ Image Transfer                                                     │
│         ├─ SSH timeout → Check VPS accessibility                         │
│         ├─ Disk full on VPS → Clean old images                          │
│         └─ Network error → Retry transfer                               │
│                                                                           │
└───────────────────────────────────────────────────────────────────────────┘

EMERGENCY ROLLBACK (when all else fails):

  ./scripts/rollback-deployment.sh root@23.235.204.208 previous

  ↓ Rolls back to last known good version in <5 minutes
  ↓ Database state preserved (no data loss)
  ↓ Automatic health verification after rollback

Diagnostic Commands Quick Reference

Check Container Status

# All containers for deployment
ssh root@23.235.204.208 'docker ps -a --filter "name=blueline-alpha"'

# Detailed status (health, ports, uptime)
ssh root@23.235.204.208 'docker ps --filter "name=blueline-alpha" --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"'

# Resource usage (CPU, memory, network)
ssh root@23.235.204.208 'docker stats --no-stream blueline-alpha-api blueline-alpha-web blueline-alpha-db'

View Logs

# API logs (last 50 lines)
ssh root@23.235.204.208 'docker logs blueline-alpha-api --tail 50'

# Web logs (last 50 lines)
ssh root@23.235.204.208 'docker logs blueline-alpha-web --tail 50'

# Follow logs in real-time
ssh root@23.235.204.208 'docker logs -f blueline-alpha-api'

# Search logs for errors
ssh root@23.235.204.208 'docker logs blueline-alpha-api 2>&1 | grep -i error | tail -20'

Test Connectivity

# Test API health from VPS
ssh root@23.235.204.208 'curl -s http://localhost:3003/health | jq'

# Test Web health from VPS
ssh root@23.235.204.208 'curl -s http://localhost:3006/ | head -20'

# Test public HTTPS
curl -s https://alpha.theblueline.com/health | jq

Container Issues

Container Won't Start

Symptoms:

  • docker ps shows container with status Exited (1)
  • Container restarts immediately after starting
  • No container appears in docker ps output

Diagnostic:

# Check exit code and logs
ssh root@23.235.204.208 'docker ps -a --filter "name=blueline-alpha-api" --format "{{.Status}}"'
ssh root@23.235.204.208 'docker logs blueline-alpha-api --tail 100'

Common Causes:

A. Missing Environment Variables

# Check .env file exists
ssh root@23.235.204.208 'ls -lh /opt/sampo-alpha/.env.blueline-alpha'

# Verify required variables
ssh root@23.235.204.208 'grep "DATABASE_URL\|JWT_SECRET\|REDIS_URL" /opt/sampo-alpha/.env.blueline-alpha | sed "s/:.*@/:***@/g"'

# Solution: Add missing variable and restart
echo "MISSING_VAR=value" >> .env.blueline-alpha
scp .env.blueline-alpha root@23.235.204.208:/opt/sampo-alpha/
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart api'

B. Database Connection Failed

# Check database is running
ssh root@23.235.204.208 'docker ps --filter "name=blueline-alpha-db"'

# Test connection from API container
ssh root@23.235.204.208 'docker exec blueline-alpha-api nc -zv blueline-alpha-db 5432'

# Solution: Restart database first, then API
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart db'
sleep 10
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart api'

C. Port Conflict

# Check what's using port 3003
ssh root@23.235.204.208 'lsof -i :3003 | head -10'

# Solution: Kill conflicting process or change port in docker-compose

Container Crashes After Startup

Symptoms:

  • Container starts, runs for seconds/minutes, then exits
  • Status shows restart loop: Restarting (1) 3 seconds ago
  • Logs show runtime errors

Diagnostic:

# Check restart count
ssh root@23.235.204.208 'docker inspect blueline-alpha-api | jq ".[0].RestartCount"'

# Check for OOM (out of memory)
ssh root@23.235.204.208 'docker inspect blueline-alpha-api | jq ".[0].State.OOMKilled"'

# Check memory usage
ssh root@23.235.204.208 'docker stats --no-stream blueline-alpha-api'

Common Causes:

A. Out of Memory (OOM)

# Check system logs for OOM
ssh root@23.235.204.208 'dmesg | grep -i "out of memory\|oom" | tail -10'

# Solution: Increase memory limit in docker-compose.blueline-alpha.override.yml
# deploy:
#   resources:
#     limits:
#       memory: 2G  # Increase from 1G

# Restart with new limits
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml up -d api'

B. Unhandled Exception

# Check logs for stack trace
ssh root@23.235.204.208 'docker logs blueline-alpha-api 2>&1 | grep -i "error\|exception\|fatal" | tail -20'

# Solution: Fix code issue, redeploy

Database Issues

Connection Failures

Symptoms:

  • API logs: Error: connect ECONNREFUSED
  • Health check fails with database error
  • Prisma client throws connection timeout

Diagnostic:

# Check database container
ssh root@23.235.204.208 'docker ps --filter "name=blueline-alpha-db"'

# Check database logs
ssh root@23.235.204.208 'docker logs blueline-alpha-db --tail 30'

# Test connection from host
ssh root@23.235.204.208 'psql -h localhost -p 5436 -U postgres -d sampo_blueline_alpha -c "SELECT 1;"'

Solution:

# Restart database container
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml up -d db'

# Wait for database to be ready
sleep 10

Migration Errors

Symptoms:

  • API startup logs show migration failures
  • Database schema out of sync with code
  • Prisma errors: "Column does not exist"

Diagnostic:

# Check migration status
ssh root@23.235.204.208 'docker exec blueline-alpha-api pnpm prisma migrate status'

# Check applied migrations
ssh root@23.235.204.208 'docker exec blueline-alpha-db psql -U postgres -d sampo_blueline_alpha -c "SELECT migration_name, finished_at FROM _prisma_migrations ORDER BY finished_at DESC LIMIT 10;"'

Solution:

# Apply pending migrations
ssh root@23.235.204.208 'docker exec blueline-alpha-api pnpm prisma migrate deploy'

# If migration failed partially, mark as rolled back
ssh root@23.235.204.208 'docker exec blueline-alpha-db psql -U postgres -d sampo_blueline_alpha -c "UPDATE _prisma_migrations SET rolled_back_at = NOW() WHERE finished_at IS NULL;"'

# Re-apply migrations
ssh root@23.235.204.208 'docker exec blueline-alpha-api pnpm prisma migrate deploy'

Performance Degradation

Symptoms:

  • Queries taking >1 second
  • API response times increasing
  • High database CPU usage

Diagnostic:

# Check active connections
ssh root@23.235.204.208 'docker exec blueline-alpha-db psql -U postgres -d sampo_blueline_alpha -c "SELECT count(*) FROM pg_stat_activity WHERE state = '\''active'\'';"'

# Check slow queries
ssh root@23.235.204.208 'docker exec blueline-alpha-db psql -U postgres -d sampo_blueline_alpha -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = '\''active'\'' ORDER BY duration DESC LIMIT 5;"'

Solution:

# Reduce Prisma connection pool
# In DATABASE_URL: postgresql://...?connection_limit=5

# Vacuum tables
ssh root@23.235.204.208 'docker exec blueline-alpha-db psql -U postgres -d sampo_blueline_alpha -c "VACUUM ANALYZE;"'

Network Issues

nginx Configuration

Symptoms:

  • 502 Bad Gateway on public URL
  • nginx logs show "upstream" errors
  • Health checks work locally but fail via HTTPS

Diagnostic:

# Check nginx status
ssh root@23.235.204.208 'systemctl status nginx --no-pager'

# Check nginx configuration
ssh root@23.235.204.208 'nginx -t'

# Check nginx error logs
ssh root@23.235.204.208 'tail -50 /var/log/nginx/error.log'

Common Causes:

A. nginx Not Running

# Solution
ssh root@23.235.204.208 'systemctl start nginx && systemctl enable nginx'

B. Apache Blocking Ports

# Check what's using ports 80 and 443
ssh root@23.235.204.208 'lsof -i :80 -i :443 | head -20'

# If Apache/httpd is listed, stop it
ssh root@23.235.204.208 'systemctl stop httpd && systemctl disable httpd'
ssh root@23.235.204.208 'systemctl restart nginx'

SSL Certificate Issues

Symptoms:

  • Browser shows "Your connection is not private"
  • Certificate expired warnings
  • curl fails with SSL error

Diagnostic:

# Check certificate status
curl -vI https://alpha.theblueline.com 2>&1 | grep -i "expire\|subject\|issuer"

# Check certificate files exist
ssh root@23.235.204.208 'ls -lh /etc/letsencrypt/live/alpha.theblueline.com/'

Solution:

# Renew certificate
ssh root@23.235.204.208 'certbot renew --nginx --force-renewal'
ssh root@23.235.204.208 'systemctl reload nginx'

Performance Issues

High Memory Usage

Symptoms:

  • Docker stats show >80% memory usage
  • Container killed by OOM
  • API responses slow/timeout

Diagnostic:

# Check current memory usage
ssh root@23.235.204.208 'docker stats --no-stream blueline-alpha-api blueline-alpha-web'

# Check OOM kills
ssh root@23.235.204.208 'dmesg | grep -i "out of memory" | tail -10'

Solution:

# Temporary: Restart container
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart api'

# Permanent: Increase memory limit in docker-compose
# deploy:
#   resources:
#     limits:
#       memory: 2G

Slow Responses

Symptoms:

  • API requests taking >3 seconds
  • Frontend loading slowly
  • Timeout errors in logs

Diagnostic:

# Check API response times
curl -o /dev/null -s -w "Time: %{time_total}s\n" https://alpha.theblueline.com/health

# Check CPU usage
ssh root@23.235.204.208 'docker stats --no-stream blueline-alpha-api | awk "{print \$3}"'

Solution:

# Check for slow database queries
ssh root@23.235.204.208 'docker logs blueline-alpha-api 2>&1 | grep "Query took"'

# Implement caching, add indexes, optimize queries

Cache Degradation

Symptoms:

  • Build times increasing over time
  • Cache hit rate dropping below 50%
  • Every build takes 10+ minutes

Diagnostic:

# Check cache performance
./scripts/analyze-build-cache.sh

# Check detailed build stats
./scripts/analyze-build-cache.sh --detailed | tail -50

# Check trends
./scripts/analyze-build-cache.sh --trends

Solution:

See Build Performance Optimization article for detailed troubleshooting.

Build Issues

Build Fails Locally

Symptoms:

  • docker build fails with compilation error
  • TypeScript errors during build
  • Out of memory during build

Diagnostic:

# Check build logs
./scripts/docker-audit/build-cross-platform.sh --api --local 2>&1 | tee build-error.log

# Check disk space
df -h

# Check Docker resources
docker system df

Common Causes:

A. TypeScript Errors

# Run local validation
pnpm type-check

# Fix type errors before building

B. Insufficient Docker Resources

# Solution: Increase Docker Desktop memory allocation
# Docker Desktop → Settings → Resources → Memory → 8GB

Build Succeeds But Container Fails

Symptoms:

  • Image builds successfully
  • Container exits immediately with code 1
  • Logs show "exec format error"

Diagnostic:

# Check image architecture
docker inspect sampo-blueline-alpha-api:latest | jq '.[0].Architecture'

# Check host architecture
uname -m
# amd64 for VPS, arm64 for M1 Mac

Solution:

# Rebuild for correct architecture
./scripts/docker-audit/build-cross-platform.sh --api --amd64  # For VPS
./scripts/docker-audit/build-cross-platform.sh --api --local  # For local Mac

Recovery Procedures

Safe Restart

Use when: Minor issues, no data loss risk

# Restart in order: db → api → web
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart db'
sleep 10
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart api'
sleep 5
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart web'

# Verify health
./scripts/deployment-smoke-test.sh root@23.235.204.208 https://alpha.theblueline.com

Emergency Rollback

Use when: New deployment broke production, need immediate revert

# Rollback to previous version
./scripts/rollback-deployment.sh root@23.235.204.208 previous

# Rollback to specific version
./scripts/rollback-deployment.sh root@23.235.204.208 20260208-143022-a4c6788

# Verify rollback
./scripts/deployment-smoke-test.sh root@23.235.204.208 https://alpha.theblueline.com

Escalation Guidelines

Severity Levels

| Level | Response Time | Action | | ----------------- | ------------- | --------------------------------------- | | P0 - Critical | Immediate | Production down, all users affected | | P1 - High | <15 minutes | Significant feature broken | | P2 - Medium | <1 hour | Minor feature broken, workaround exists | | P3 - Low | <1 day | Cosmetic issue, no user impact |

When to Escalate

P0 - Critical:

  • API health check fails for >2 minutes
  • Database unreachable
  • All users unable to access system

P1 - High:

  • Response times >5 seconds consistently
  • Memory usage >90%
  • Key feature completely broken

P2 - Medium:

  • Cache hit rate <50% for multiple builds
  • Disk usage >80%
  • Non-critical feature degraded

P3 - Low:

  • Slow build times (but completing)
  • Minor performance issues
  • Documentation updates needed

What to Provide

When escalating:

  1. Severity level (P0-P3)
  2. Symptoms observed
  3. Diagnostic commands run
  4. Error messages/logs
  5. What was tried
  6. Current system state

Related Articles

Core Documentation

Advanced Resources

  • 📄 Full troubleshooting guide: docs/operations/deployment-troubleshooting.md (800+ lines)
  • 🔧 Deployment runbook: docs/operations/deployment-runbook.md
  • 📊 Quick reference: docs/operations/docker-deployment-quick-reference.md

Emergency Contacts

For critical issues (P0):

  • DevOps Team: [escalation process]
  • On-Call Engineer: [contact info]

For questions (P1-P3):

  • Check troubleshooting guide first
  • DevOps Team: [support channel]

Was this article helpful?

Your feedback helps us improve our support content.

Still need assistance?

Our support team is ready to help you with more complex issues.

Contact Support