DevOps for AI: Docker to Production

Q: How do I handle secrets like API keys?

Store them in an .env file outside the image and mount at runtime. Never bake secrets into the image. For larger setups, use Doppler, AWS Secrets Manager, or 1Password CLI.

You’ve built an amazing AI application locally. Now you need to deploy it. Simple, right?

Except your laptop has 64GB RAM, local model files, cached embeddings, and environment variables scattered across three different files. Production has… none of that.

Here’s how to bridge the gap between “works on my machine” and “running reliably in production.”

The Production Deployment Checklist

Before the specifics, here’s what you need:

ComponentPurposeExample ToolsContainerizationReproducible environmentsDocker, PodmanOrchestrationManage multiple containersDocker Compose, KubernetesReverse ProxyHandle HTTPS, routingCaddy, Nginx, TraefikCI/CDAutomated testing & deploymentGitHub Actions, GitLab CISecrets ManagementSecure API keys, passwordsVault, AWS Secrets ManagerMonitoringKnow when things breakGrafana, Datadog, SentryLoggingDebug production issuesLoki, CloudWatch, Better Stack## Step 1: Dockerize Your Application

Docker ensures your app runs the same everywhere. Here’s a production-ready Dockerfile for a Python AI application:


# Multi-stage build for smaller images
FROM python:3.11-slim as builder

WORKDIR /app

# Install build dependencies
RUN apt-get update && apt-get install -y 
    gcc 
    g++ 
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# Final stage
FROM python:3.11-slim

WORKDIR /app

# Copy only what we need from builder
COPY --from=builder /root/.local /root/.local
COPY . .

# Make sure scripts are in PATH
ENV PATH=/root/.local/bin:$PATH

# Don't run as root
RUN useradd -m appuser && chown -R appuser /app
USER appuser

# Health check
HEALTHCHECK --interval=30s --timeout=3s 
  CMD python -c "import requests; requests.get('http://localhost:8000/health')"

# Run application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Key Docker Best Practices

Multi-stage builds: Reduce final image size by 60-80%
Don’t run as root: Security best practice
Health checks: Let orchestrators know if container is healthy
Specific base images: Use python:3.11-slim not python:latest
.dockerignore: Exclude unnecessary files (node_modules, .git, cache)

Step 2: Environment Configuration

Never hardcode API keys or secrets. Use environment variables:


# .env.example (check this into git)
ANTHROPIC_API_KEY=sk-ant-xxx
DATABASE_URL=postgresql://localhost/mydb
REDIS_URL=redis://localhost:6379
LOG_LEVEL=info

# .env (never commit this!)
ANTHROPIC_API_KEY=sk-ant-real-key-here
DATABASE_URL=postgresql://user:pass@prod-db.com/prod

Load them in your app:


from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    anthropic_api_key: str
    database_url: str
    redis_url: str = "redis://localhost:6379"  # default
    log_level: str = "info"

    class Config:
        env_file = ".env"

settings = Settings()

Step 3: Docker Compose for Local Development

Run your entire stack with one command:


version: '3.8'

services:
  app:
    build: .
    ports:
      - "8000:8000"
    environment:
      - DATABASE_URL=postgresql://postgres:password@db:5432/mydb
      - REDIS_URL=redis://redis:6379
    env_file:
      - .env
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
    volumes:
      - ./app:/app  # hot reload in dev
    restart: unless-stopped

  db:
    image: postgres:15-alpine
    environment:
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=mydb
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data

  # Vector database for RAG
  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
    volumes:
      - qdrant_data:/qdrant/storage

volumes:
  postgres_data:
  redis_data:
  qdrant_data:

Run everything:


docker compose up -d

Your app, database, Redis, and vector database are now running.

Step 4: Caddy for HTTPS and Reverse Proxy

Caddy automatically provisions SSL certificates from Let’s Encrypt. Configuration is beautifully simple:


# Caddyfile

ai.yourdomain.com {
    # Automatic HTTPS!
    reverse_proxy app:8000

    # Rate limiting
    rate_limit {
        zone app_zone {
            key {remote_host}
            events 100
            window 1m
        }
    }

    # Logging
    log {
        output file /var/log/caddy/access.log
        format json
    }
}

# Separate domain for admin panel
admin.yourdomain.com {
    reverse_proxy app:8000

    # Basic auth
    basicauth {
        admin $2a$14$hashed_password_here
    }
}

Add Caddy to docker-compose.yml:


  caddy:
    image: caddy:2-alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
      - caddy_data:/data
      - caddy_config:/config
    restart: unless-stopped

Boom. HTTPS, rate limiting, and load balancing in 20 lines.

Step 5: CI/CD Pipeline

Automate testing and deployment with GitHub Actions:


# .github/workflows/deploy.yml

name: Deploy to Production

on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install pytest pytest-cov

      - name: Run tests
        run: pytest --cov=app tests/

      - name: Run LLM evals
        run: python scripts/run_evals.py
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

  deploy:
    needs: test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'

    steps:
      - uses: actions/checkout@v3

      - name: Build and push Docker image
        uses: docker/build-push-action@v4
        with:
          push: true
          tags: your-registry/ai-app:latest

      - name: Deploy to server
        uses: appleboy/ssh-action@master
        with:
          host: ${{ secrets.SERVER_HOST }}
          username: ${{ secrets.SERVER_USER }}
          key: ${{ secrets.SSH_PRIVATE_KEY }}
          script: |
            cd /opt/ai-app
            docker compose pull
            docker compose up -d
            docker compose exec app python scripts/migrate.py

Now every push to main:

Runs unit tests
Runs LLM evaluations
Builds Docker image
Deploys to production
Runs database migrations

All automatically.

Step 6: Monitoring and Observability

You need to know when things break. Set up Grafana + Prometheus:


# docker-compose.yml additions

  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=your_password_here

Instrument your app:


from prometheus_client import Counter, Histogram
import time

llm_requests = Counter('llm_requests_total', 'Total LLM requests')
llm_latency = Histogram('llm_request_duration_seconds', 'LLM request latency')
llm_cost = Counter('llm_cost_dollars', 'Total LLM cost in dollars')

@llm_latency.time()
async def call_llm(prompt: str):
    llm_requests.inc()
    start = time.time()

    response = await client.messages.create(
        model="claude-3-5-sonnet-20241022",
        messages=[{"role": "user", "content": prompt}]
    )

    # Track cost
    cost = calculate_cost(response.usage)
    llm_cost.inc(cost)

    return response

Now you have dashboards showing:

Request volume
Latency (p50, p95, p99)
Error rates
Cost per hour
Cache hit rates

Step 7: Blue-Green Deployments

Deploy new versions without downtime:


# Deploy new version (green)
docker compose -f docker-compose.green.yml up -d

# Test it on separate port
curl http://localhost:8001/health

# If good, switch traffic (update Caddy)
# If bad, kill green and keep blue

Or use Kubernetes for automatic rolling updates.

Common Production Issues (And Fixes)

Issue: Out of Memory

Symptom: Container keeps restarting
Fix: Set memory limits in docker-compose.yml:


services:
  app:
    deploy:
      resources:
        limits:
          memory: 4G
        reservations:
          memory: 2G

Issue: Slow Performance

Symptom: Requests timing out
Fix: Add Redis caching, increase worker processes, use async I/O

Issue: Database Connection Exhaustion

Symptom: “Too many connections” errors
Fix: Use connection pooling (SQLAlchemy, asyncpg), increase DB max connections

Issue: Secrets Leaked in Logs

Symptom: API keys visible in logs
Fix: Scrub logs, use structured logging with sensitive field redaction

The Production Deployment Runbook

When deploying a major change:

☐ Test locally with docker compose up
☐ Run full test suite including LLM evals
☐ Deploy to staging environment first
☐ Run smoke tests on staging
☐ Deploy to 10% of prod traffic (canary)
☐ Monitor for 1 hour
☐ If metrics look good, deploy to 100%
☐ If anything breaks, rollback immediately
☐ Keep deployment window open for 24 hours

Cost Optimization

Running AI in production can get expensive. Optimize:

Use spot instances for batch jobs (save 60-80%)
Auto-scale workers based on queue depth
Cache LLM responses aggressively (Redis with 1-hour TTL)
Use smaller models where quality difference is minimal
Batch similar requests to save on API calls

Security Hardening

Essential security practices:

Run containers as non-root user
Use secrets management (Vault, AWS Secrets Manager)
Enable rate limiting (prevent abuse)
Scan Docker images for vulnerabilities (Trivy, Snyk)
Use network policies (isolate services)
Enable audit logging (track all API calls)
Rotate API keys regularly

Backup and Disaster Recovery

What happens if your server dies?

Database backups: Automated daily backups to S3
Vector DB backups: Regular snapshots of Qdrant/Weaviate
Configuration backups: Store in git (Infrastructure as Code)
Recovery time objective: Can you restore in under 1 hour?

The Bottom Line

Deploying AI applications is more complex than traditional apps because:

They depend on external APIs (LLMs)
They have ML-specific failure modes
They can be expensive to run at scale
They require continuous monitoring and improvement

But with the right DevOps practices, you can run AI in production with confidence.

Start simple (Docker + Docker Compose), add complexity only when needed (Kubernetes), and always measure what matters (latency, cost, user satisfaction).

Now go deploy that AI app. The world is waiting.

Frequently Asked Questions

Do I really need Docker for AI apps?

Yes for production. AI apps depend on specific Python, CUDA, and library versions that conflict on shared servers. Docker isolates the dependency tree and gives you reproducible deploys.

What’s the smallest production stack I can use?

A single VPS with Docker Compose, Caddy as reverse proxy and TLS, plus your app and a Postgres or Redis container. That setup handles thousands of daily users on a $20/month box.

How do I deploy LLM apps with zero downtime?

Run two containers behind Caddy, swap traffic with a config reload, then drain the old one. Most teams use Docker Compose plus a deploy script; Kubernetes is overkill for under 5 services.

What about GPU workloads?

Use the NVIDIA Container Toolkit to expose GPUs to Docker. For inference at scale, hosted endpoints (Replicate, Fireworks, Together) are usually cheaper than running your own GPUs.

How do I handle secrets like API keys?

Store them in an .env file outside the image and mount at runtime. Never bake secrets into the image. For larger setups, use Doppler, AWS Secrets Manager, or 1Password CLI.