Deployment Troubleshooting Guide

Last updated: February 8, 2026
Admin Tools

Deployment Troubleshooting Guide

Quick Diagnosis Table

Use this table to quickly identify the most likely cause based on symptoms:

SymptomMost Likely CauseQuick CheckSection
Container won't startEnvironment vars, port conflictdocker logs blueline-alpha-apiContainer Issues
Health check returns 502nginx down, container crashedsystemctl status nginxNetwork Issues
Health check returns 503API starting/unhealthydocker ps (check STATUS)Container Issues
Database connection failedWrong DB host, password, portCheck env varsDatabase Issues
Slow API responsesMemory leak, unoptimized queriesdocker statsPerformance Issues
Build fails on VPSWrong architecture (ARM vs AMD64)Check build platformBuild Issues
Cache miss on every buildLayer order changed, .dockerignoreRun cache analyticsBuild Issues

Troubleshooting Decision Tree

                          DEPLOYMENT ISSUE
                                 │
                    ┌────────────┴────────────┐
                    │                         │
             What's Broken?              When Did It Break?
                    │                         │
        ┌───────────┼───────────┐        ┌───┴───┐
        │           │           │        │       │
    Container     API/Web    Database   During  After
      Won't      Responds     Issue     Build  Deploy
      Start       Slowly                   │       │
        │           │           │          │       │
        ▼           ▼           ▼          ▼       ▼

┌───────────────────────────────────────────────────────────────────────────┐
│                     CONTAINER WON'T START                                 │
├───────────────────────────────────────────────────────────────────────────┤
│                                                                           │
│  1. Check Logs: docker logs blueline-alpha-api --tail 50                 │
│                                                                           │
│  ┌─────────────────────────────────────────────────────────────┐         │
│  │ Error Contains                    │ Solution                │         │
│  ├───────────────────────────────────┼─────────────────────────┤         │
│  │ "JWT_SECRET"                      │ → Env Vars (Section A)  │         │
│  │ "DATABASE_URL"                    │ → Env Vars (Section A)  │         │
│  │ "EADDRINUSE" / "port already"     │ → Port Conflict (B)     │         │
│  │ "Migration failed" / "database"   │ → Database Issues       │         │
│  │ "ECONNREFUSED"                    │ → DB Not Running        │         │
│  └───────────────────────────────────┴─────────────────────────┘         │
│                                                                           │
│  A. Env Vars Missing                                                     │
│     → docker exec blueline-alpha-api env | grep -E 'JWT|DATABASE'       │
│     → Fix: Update .env.blueline-alpha, rebuild, redeploy                │
│                                                                           │
│  B. Port Conflict                                                        │
│     → lsof -i :3003 (check what's using the port)                       │
│     → Fix: Stop conflicting service or change port in compose           │
│                                                                           │
└───────────────────────────────────────────────────────────────────────────┘

┌───────────────────────────────────────────────────────────────────────────┐
│                    SLOW RESPONSES / PERFORMANCE                           │
├───────────────────────────────────────────────────────────────────────────┤
│                                                                           │
│  1. Check Resources: docker stats --no-stream blueline-alpha-api         │
│                                                                           │
│  ┌─────────────────────────────────────────────────────────────┐         │
│  │ Metric          │ Threshold  │ Action If Exceeded           │         │
│  ├─────────────────┼────────────┼──────────────────────────────┤         │
│  │ CPU Usage       │ >80%       │ → Check for tight loops      │         │
│  │ Memory Usage    │ >90%       │ → Memory Leak (restart)      │         │
│  │ Network I/O     │ >100MB/s   │ → Check for excessive logs   │         │
│  └─────────────────┴────────────┴──────────────────────────────┘         │
│                                                                           │
│  Memory >90%?                                                            │
│    ├─ Yes → RESTART CONTAINER (memory leak)                             │
│    │         docker compose restart api                                  │
│    │         Monitor: Watch memory over next hour                        │
│    │                                                                      │
│    └─ No → CHECK DATABASE                                               │
│              ├─ Slow query logs enabled?                                 │
│              ├─ Missing indexes? (check query plans)                     │
│              └─ Too many connections? (check connection pool)            │
│                                                                           │
└───────────────────────────────────────────────────────────────────────────┘

┌───────────────────────────────────────────────────────────────────────────┐
│                       HEALTH CHECK FAILS                                  │
├───────────────────────────────────────────────────────────────────────────┤
│                                                                           │
│  Response Code?                                                          │
│    │                                                                      │
│    ├─ 502 (Bad Gateway)                                                 │
│    │    ├─ nginx running? → systemctl status nginx                      │
│    │    ├─ Container crashed? → docker ps (check STATUS)                │
│    │    └─ Wrong upstream? → nginx config (/etc/nginx/sites-enabled/)   │
│    │                                                                      │
│    ├─ 503 (Service Unavailable)                                         │
│    │    ├─ Container starting? → docker logs (wait 30s)                 │
│    │    ├─ Health endpoint broken? → curl localhost:3003/health         │
│    │    └─ Database down? → Check DB container                          │
│    │                                                                      │
│    ├─ 504 (Gateway Timeout)                                             │
│    │    ├─ Slow startup? → Check migration logs                         │
│    │    ├─ Database locked? → Check active queries                      │
│    │    └─ Memory issue? → docker stats                                 │
│    │                                                                      │
│    └─ Connection Refused                                                 │
│         ├─ Container not running? → docker ps -a                         │
│         ├─ Wrong port? → Check compose file (3003 for blueline)         │
│         └─ Firewall? → iptables -L (check rules)                        │
│                                                                           │
└───────────────────────────────────────────────────────────────────────────┘

┌───────────────────────────────────────────────────────────────────────────┐
│                         BUILD FAILURES                                    │
├───────────────────────────────────────────────────────────────────────────┤
│                                                                           │
│  Where Did Build Fail?                                                   │
│    │                                                                      │
│    ├─ Environment Validation                                             │
│    │    → .env file missing or incomplete                                │
│    │    → Run: ./scripts/validate-deployment-env.sh                      │
│    │                                                                      │
│    ├─ TypeScript Type-Check                                              │
│    │    → Fix type errors in code                                        │
│    │    → Run: pnpm type-check                                           │
│    │                                                                      │
│    ├─ Docker Build                                                       │
│    │    ├─ exec format error → Wrong architecture (use --amd64)         │
│    │    ├─ COPY failed → Missing files (check .dockerignore)            │
│    │    └─ Out of memory → Increase Docker resources                    │
│    │                                                                      │
│    ├─ Container Smoke Tests                                              │
│    │    ├─ Container won't start → Check Dockerfile entrypoint          │
│    │    ├─ Health endpoint timeout → Increase startup wait time         │
│    │    └─ Port binding failed → Change test port                       │
│    │                                                                      │
│    └─ Image Transfer                                                     │
│         ├─ SSH timeout → Check VPS accessibility                         │
│         ├─ Disk full on VPS → Clean old images                          │
│         └─ Network error → Retry transfer                               │
│                                                                           │
└───────────────────────────────────────────────────────────────────────────┘

EMERGENCY ROLLBACK (when all else fails):

  ./scripts/rollback-deployment.sh root@23.235.204.208 previous

  ↓ Rolls back to last known good version in <5 minutes
  ↓ Database state preserved (no data loss)
  ↓ Automatic health verification after rollback

Diagnostic Commands Quick Reference

Check Container Status

# All containers for deployment
ssh root@23.235.204.208 'docker ps -a --filter "name=blueline-alpha"'

# Detailed status (health, ports, uptime)
ssh root@23.235.204.208 'docker ps --filter "name=blueline-alpha" --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"'

# Resource usage (CPU, memory, network)
ssh root@23.235.204.208 'docker stats --no-stream blueline-alpha-api blueline-alpha-web blueline-alpha-db'

View Logs

# API logs (last 50 lines)
ssh root@23.235.204.208 'docker logs blueline-alpha-api --tail 50'

# Web logs (last 50 lines)
ssh root@23.235.204.208 'docker logs blueline-alpha-web --tail 50'

# Follow logs in real-time
ssh root@23.235.204.208 'docker logs -f blueline-alpha-api'

# Search logs for errors
ssh root@23.235.204.208 'docker logs blueline-alpha-api 2>&1 | grep -i error | tail -20'

Test Connectivity

# Test API health from VPS
ssh root@23.235.204.208 'curl -s http://localhost:3003/health | jq'

# Test Web health from VPS
ssh root@23.235.204.208 'curl -s http://localhost:3006/ | head -20'

# Test public HTTPS
curl -s https://alpha.theblueline.com/health | jq

Container Issues

Container Won't Start

Symptoms:

  • docker ps shows container with status Exited (1)
  • Container restarts immediately after starting
  • No container appears in docker ps output

Diagnostic:

# Check exit code and logs
ssh root@23.235.204.208 'docker ps -a --filter "name=blueline-alpha-api" --format "{{.Status}}"'
ssh root@23.235.204.208 'docker logs blueline-alpha-api --tail 100'

Common Causes:

A. Missing Environment Variables

# Check .env file exists
ssh root@23.235.204.208 'ls -lh /opt/sampo-alpha/.env.blueline-alpha'

# Verify required variables
ssh root@23.235.204.208 'grep "DATABASE_URL\|JWT_SECRET\|REDIS_URL" /opt/sampo-alpha/.env.blueline-alpha | sed "s/:.*@/:***@/g"'

# Solution: Add missing variable and restart
echo "MISSING_VAR=value" >> .env.blueline-alpha
scp .env.blueline-alpha root@23.235.204.208:/opt/sampo-alpha/
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart api'

B. Database Connection Failed

# Check database is running
ssh root@23.235.204.208 'docker ps --filter "name=blueline-alpha-db"'

# Test connection from API container
ssh root@23.235.204.208 'docker exec blueline-alpha-api nc -zv blueline-alpha-db 5432'

# Solution: Restart database first, then API
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart db'
sleep 10
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart api'

C. Port Conflict

# Check what's using port 3003
ssh root@23.235.204.208 'lsof -i :3003 | head -10'

# Solution: Kill conflicting process or change port in docker-compose

Container Crashes After Startup

Symptoms:

  • Container starts, runs for seconds/minutes, then exits
  • Status shows restart loop: Restarting (1) 3 seconds ago
  • Logs show runtime errors

Diagnostic:

# Check restart count
ssh root@23.235.204.208 'docker inspect blueline-alpha-api | jq ".[0].RestartCount"'

# Check for OOM (out of memory)
ssh root@23.235.204.208 'docker inspect blueline-alpha-api | jq ".[0].State.OOMKilled"'

# Check memory usage
ssh root@23.235.204.208 'docker stats --no-stream blueline-alpha-api'

Common Causes:

A. Out of Memory (OOM)

# Check system logs for OOM
ssh root@23.235.204.208 'dmesg | grep -i "out of memory\|oom" | tail -10'

# Solution: Increase memory limit in docker-compose.blueline-alpha.override.yml
# deploy:
#   resources:
#     limits:
#       memory: 2G  # Increase from 1G

# Restart with new limits
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml up -d api'

B. Unhandled Exception

# Check logs for stack trace
ssh root@23.235.204.208 'docker logs blueline-alpha-api 2>&1 | grep -i "error\|exception\|fatal" | tail -20'

# Solution: Fix code issue, redeploy

Database Issues

Connection Failures

Symptoms:

  • API logs: Error: connect ECONNREFUSED
  • Health check fails with database error
  • Prisma client throws connection timeout

Diagnostic:

# Check database container
ssh root@23.235.204.208 'docker ps --filter "name=blueline-alpha-db"'

# Check database logs
ssh root@23.235.204.208 'docker logs blueline-alpha-db --tail 30'

# Test connection from host
ssh root@23.235.204.208 'psql -h localhost -p 5436 -U postgres -d sampo_blueline_alpha -c "SELECT 1;"'

Solution:

# Restart database container
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml up -d db'

# Wait for database to be ready
sleep 10

Migration Errors

Symptoms:

  • API startup logs show migration failures
  • Database schema out of sync with code
  • Prisma errors: "Column does not exist"

Diagnostic:

# Check migration status
ssh root@23.235.204.208 'docker exec blueline-alpha-api pnpm prisma migrate status'

# Check applied migrations
ssh root@23.235.204.208 'docker exec blueline-alpha-db psql -U postgres -d sampo_blueline_alpha -c "SELECT migration_name, finished_at FROM _prisma_migrations ORDER BY finished_at DESC LIMIT 10;"'

Solution:

# Apply pending migrations
ssh root@23.235.204.208 'docker exec blueline-alpha-api pnpm prisma migrate deploy'

# If migration failed partially, mark as rolled back
ssh root@23.235.204.208 'docker exec blueline-alpha-db psql -U postgres -d sampo_blueline_alpha -c "UPDATE _prisma_migrations SET rolled_back_at = NOW() WHERE finished_at IS NULL;"'

# Re-apply migrations
ssh root@23.235.204.208 'docker exec blueline-alpha-api pnpm prisma migrate deploy'

Performance Degradation

Symptoms:

  • Queries taking >1 second
  • API response times increasing
  • High database CPU usage

Diagnostic:

# Check active connections
ssh root@23.235.204.208 'docker exec blueline-alpha-db psql -U postgres -d sampo_blueline_alpha -c "SELECT count(*) FROM pg_stat_activity WHERE state = '\''active'\'';"'

# Check slow queries
ssh root@23.235.204.208 'docker exec blueline-alpha-db psql -U postgres -d sampo_blueline_alpha -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = '\''active'\'' ORDER BY duration DESC LIMIT 5;"'

Solution:

# Reduce Prisma connection pool
# In DATABASE_URL: postgresql://...?connection_limit=5

# Vacuum tables
ssh root@23.235.204.208 'docker exec blueline-alpha-db psql -U postgres -d sampo_blueline_alpha -c "VACUUM ANALYZE;"'

Network Issues

nginx Configuration

Symptoms:

  • 502 Bad Gateway on public URL
  • nginx logs show "upstream" errors
  • Health checks work locally but fail via HTTPS

Diagnostic:

# Check nginx status
ssh root@23.235.204.208 'systemctl status nginx --no-pager'

# Check nginx configuration
ssh root@23.235.204.208 'nginx -t'

# Check nginx error logs
ssh root@23.235.204.208 'tail -50 /var/log/nginx/error.log'

Common Causes:

A. nginx Not Running

# Solution
ssh root@23.235.204.208 'systemctl start nginx && systemctl enable nginx'

B. Apache Blocking Ports

# Check what's using ports 80 and 443
ssh root@23.235.204.208 'lsof -i :80 -i :443 | head -20'

# If Apache/httpd is listed, stop it
ssh root@23.235.204.208 'systemctl stop httpd && systemctl disable httpd'
ssh root@23.235.204.208 'systemctl restart nginx'

SSL Certificate Issues

Symptoms:

  • Browser shows "Your connection is not private"
  • Certificate expired warnings
  • curl fails with SSL error

Diagnostic:

# Check certificate status
curl -vI https://alpha.theblueline.com 2>&1 | grep -i "expire\|subject\|issuer"

# Check certificate files exist
ssh root@23.235.204.208 'ls -lh /etc/letsencrypt/live/alpha.theblueline.com/'

Solution:

# Renew certificate
ssh root@23.235.204.208 'certbot renew --nginx --force-renewal'
ssh root@23.235.204.208 'systemctl reload nginx'

Performance Issues

High Memory Usage

Symptoms:

  • Docker stats show >80% memory usage
  • Container killed by OOM
  • API responses slow/timeout

Diagnostic:

# Check current memory usage
ssh root@23.235.204.208 'docker stats --no-stream blueline-alpha-api blueline-alpha-web'

# Check OOM kills
ssh root@23.235.204.208 'dmesg | grep -i "out of memory" | tail -10'

Solution:

# Temporary: Restart container
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart api'

# Permanent: Increase memory limit in docker-compose
# deploy:
#   resources:
#     limits:
#       memory: 2G

Slow Responses

Symptoms:

  • API requests taking >3 seconds
  • Frontend loading slowly
  • Timeout errors in logs

Diagnostic:

# Check API response times
curl -o /dev/null -s -w "Time: %{time_total}s\n" https://alpha.theblueline.com/health

# Check CPU usage
ssh root@23.235.204.208 'docker stats --no-stream blueline-alpha-api | awk "{print \$3}"'

Solution:

# Check for slow database queries
ssh root@23.235.204.208 'docker logs blueline-alpha-api 2>&1 | grep "Query took"'

# Implement caching, add indexes, optimize queries

Cache Degradation

Symptoms:

  • Build times increasing over time
  • Cache hit rate dropping below 50%
  • Every build takes 10+ minutes

Diagnostic:

# Check cache performance
./scripts/analyze-build-cache.sh

# Check detailed build stats
./scripts/analyze-build-cache.sh --detailed | tail -50

# Check trends
./scripts/analyze-build-cache.sh --trends

Solution:

See Build Performance Optimization article for detailed troubleshooting.

Build Issues

Build Fails Locally

Symptoms:

  • docker build fails with compilation error
  • TypeScript errors during build
  • Out of memory during build

Diagnostic:

# Check build logs
./scripts/docker-audit/build-cross-platform.sh --api --local 2>&1 | tee build-error.log

# Check disk space
df -h

# Check Docker resources
docker system df

Common Causes:

A. TypeScript Errors

# Run local validation
pnpm type-check

# Fix type errors before building

B. Insufficient Docker Resources

# Solution: Increase Docker Desktop memory allocation
# Docker Desktop → Settings → Resources → Memory → 8GB

Build Succeeds But Container Fails

Symptoms:

  • Image builds successfully
  • Container exits immediately with code 1
  • Logs show "exec format error"

Diagnostic:

# Check image architecture
docker inspect sampo-blueline-alpha-api:latest | jq '.[0].Architecture'

# Check host architecture
uname -m
# amd64 for VPS, arm64 for M1 Mac

Solution:

# Rebuild for correct architecture
./scripts/docker-audit/build-cross-platform.sh --api --amd64  # For VPS
./scripts/docker-audit/build-cross-platform.sh --api --local  # For local Mac

Recovery Procedures

Safe Restart

Use when: Minor issues, no data loss risk

# Restart in order: db → api → web
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart db'
sleep 10
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart api'
sleep 5
ssh root@23.235.204.208 'cd /opt/sampo-alpha && docker compose -f docker-compose.base.yml -f docker-compose.blueline-alpha.override.yml restart web'

# Verify health
./scripts/deployment-smoke-test.sh root@23.235.204.208 https://alpha.theblueline.com

Emergency Rollback

Use when: New deployment broke production, need immediate revert

# Rollback to previous version
./scripts/rollback-deployment.sh root@23.235.204.208 previous

# Rollback to specific version
./scripts/rollback-deployment.sh root@23.235.204.208 20260208-143022-a4c6788

# Verify rollback
./scripts/deployment-smoke-test.sh root@23.235.204.208 https://alpha.theblueline.com

Escalation Guidelines

Severity Levels

LevelResponse TimeAction
P0 - CriticalImmediateProduction down, all users affected
P1 - High<15 minutesSignificant feature broken
P2 - Medium<1 hourMinor feature broken, workaround exists
P3 - Low<1 dayCosmetic issue, no user impact

When to Escalate

P0 - Critical:

  • API health check fails for >2 minutes
  • Database unreachable
  • All users unable to access system

P1 - High:

  • Response times >5 seconds consistently
  • Memory usage >90%
  • Key feature completely broken

P2 - Medium:

  • Cache hit rate <50% for multiple builds
  • Disk usage >80%
  • Non-critical feature degraded

P3 - Low:

  • Slow build times (but completing)
  • Minor performance issues
  • Documentation updates needed

What to Provide

When escalating:

  1. Severity level (P0-P3)
  2. Symptoms observed
  3. Diagnostic commands run
  4. Error messages/logs
  5. What was tried
  6. Current system state

Related Articles

Core Documentation

Advanced Resources

  • 📄 Full troubleshooting guide: docs/operations/deployment-troubleshooting.md (800+ lines)
  • 🔧 Deployment runbook: docs/operations/deployment-runbook.md
  • 📊 Quick reference: docs/operations/docker-deployment-quick-reference.md

Emergency Contacts

For critical issues (P0):

  • DevOps Team: [escalation process]
  • On-Call Engineer: [contact info]

For questions (P1-P3):

  • Check troubleshooting guide first
  • DevOps Team: [support channel]

Was this article helpful?

Your feedback helps us improve our support content.

Still need assistance?

Our support team is ready to help you with more complex issues.

Contact Support