一、前言 容器技术已经成为现代软件交付的标准基础设施。根据 2026 年云原生调查报告,92% 的企业已在生产环境使用容器技术 ,其中 Docker 以78% 的市场占有率 继续领跑容器 runtime 市场。
Docker 的核心价值在于:
✅ 环境一致性 :开发、测试、生产环境完全一致
✅ 快速部署 :秒级启动,弹性伸缩
✅ 资源隔离 :进程、网络、文件系统隔离
✅ 可移植性 :一次构建,到处运行
✅ 微服务基础 :容器是微服务架构的理想载体
然而,许多团队在使用 Docker 时常常遇到以下问题:
❌ 镜像体积过大(几百 MB 甚至 GB 级别)
❌ 构建速度慢(每次构建需要 10+ 分钟)
❌ 安全隐患(以 root 运行、漏洞镜像)
❌ 资源浪费(未限制 CPU/内存)
❌ 日志混乱(未统一收集管理)
❌ 网络配置复杂(容器间通信困难)
本文将系统性地介绍 Docker 容器化的最佳实践,涵盖从基础概念到生产部署的完整知识体系。通过 3 个架构图、2 个实战案例和大量代码示例,帮助你构建高效、安全、可维护 的容器化应用。
二、Docker 核心概念 2.1 Docker 架构 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 graph TB A[Docker 客户端] --> B[Docker Daemon] B --> C[镜像管理] B --> D[容器管理] B --> E[网络管理] B --> F[存储管理] C --> C1[镜像构建<br/>镜像拉取<br/>镜像推送] D --> D1[容器创建<br/>容器启动<br/>容器停止] E --> E1[桥接网络<br/>主机网络<br/>覆盖网络] F --> F1[卷管理<br/>绑定挂载<br/>临时文件系统] G[Docker Registry] --> B H[容器应用 1] --> D I[容器应用 2] --> D J[容器应用 3] --> D
2.2 镜像与容器 镜像(Image) :只读模板,包含应用运行所需的所有依赖。
容器(Container) :镜像的运行实例,可读写。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 docker images docker pull nginx:1.25 docker push myapp:1.0 docker build -t myapp:1.0 . docker rmi myapp:1.0 docker ps docker run -d --name myapp myapp:1.0 docker stop myapp docker start myapp docker rm myapp docker logs myapp docker exec -it myapp bash
2.3 容器生命周期 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 sequenceDiagram participant Dev as 开发者 participant Docker as Docker CLI participant Daemon as Docker Daemon participant Registry as Docker Registry Dev->>Docker: docker build -t myapp:1.0 . Docker->>Daemon: 构建请求 Daemon->>Daemon: 执行 Dockerfile Daemon-->>Docker: 返回镜像 ID Dev->>Docker: docker push myapp:1.0 Docker->>Daemon: 推送请求 Daemon->>Registry: 上传镜像层 Registry-->>Daemon: 推送成功 Dev->>Docker: docker run -d myapp:1.0 Docker->>Daemon: 运行请求 Daemon->>Daemon: 创建容器 Daemon->>Daemon: 启动容器 Daemon-->>Docker: 返回容器 ID Dev->>Docker: docker stop myapp Docker->>Daemon: 停止请求 Daemon->>Daemon: 停止容器 Daemon-->>Docker: 停止成功
三、Dockerfile 编写最佳实践 3.1 选择合适的基础镜像 原则 :选择最小化、安全、维护良好的基础镜像。
1 2 3 4 5 6 7 8 9 10 11 12 13 FROM ubuntu:22.04 FROM alpine:3.19 FROM gcr.io/distroless/nodejs18FROM node:18 -slimFROM python:3.11 -slimFROM golang:1.21 -alpine
基础镜像对比 :
镜像类型
大小
安全性
适用场景
Ubuntu/Debian
70-100MB
中
需要完整系统工具
Alpine
5-10MB
高
通用推荐
Slim
30-50MB
中高
需要部分系统工具
Distroless
2-5MB
极高
生产环境,无需调试
3.2 多阶段构建 多阶段构建可以显著减小最终镜像体积。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 FROM node:18 WORKDIR /app COPY . . RUN npm install RUN npm run build EXPOSE 3000 CMD ["node" , "dist/index.js" ] FROM node:18 AS builderWORKDIR /app COPY package*.json ./ RUN npm ci --only=production COPY . . RUN npm run build FROM node:18 -slimWORKDIR /app COPY --from=builder /app/node_modules ./node_modules COPY --from=builder /app/dist ./dist COPY --from=builder /app/package.json ./ EXPOSE 3000 CMD ["node" , "dist/index.js" ]
3.3 优化层缓存 Docker 镜像由多个层组成,合理利用缓存可以加速构建。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 FROM node:18 WORKDIR /app COPY . . RUN npm install CMD ["node" , "index.js" ] FROM node:18 WORKDIR /app COPY package*.json ./ RUN npm ci --only=production COPY . . CMD ["node" , "index.js" ]
层缓存原理 :
1 2 3 4 5 6 7 8 9 10 11 graph LR A[Dockerfile 指令] --> B[层 1: FROM] B --> C[层 2: WORKDIR] C --> D[层 3: COPY package.json] D --> E[层 4: RUN npm install] E --> F[层 5: COPY . .] F --> G[层 6: CMD] H[第一次构建] --> I[所有层都执行] J[第二次构建<br/>仅代码变更] --> K[层 1-4 使用缓存] K --> L[层 5-6 重新执行]
3.4 减少镜像层数 每条指令都会创建新层,合并指令可以减少层数。
1 2 3 4 5 6 7 8 9 10 11 12 RUN apt-get update RUN apt-get install -y curl RUN apt-get install -y wget RUN rm -rf /var/lib/apt/lists/* RUN apt-get update && \ apt-get install -y --no-install-recommends \ curl \ wget && \ rm -rf /var/lib/apt/lists/*
3.5 使用 .dockerignore 排除不必要的文件,减小构建上下文。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 # 依赖目录 node_modules npm-debug.log venv __pycache__ # 构建产物 dist build *.pyc *.pyo # 配置文件 .env .env.local .env.production *.log # 开发工具 .vscode .idea *.swp *.swo # 测试文件 test tests *.test.js *.spec.js coverage # 文档 README.md docs *.md # Git .git .gitignore # Docker Dockerfile docker-compose.yml .dockerignore
3.6 非 root 用户运行 以非 root 用户运行容器,提高安全性。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 FROM node:18 -slimWORKDIR /app RUN groupadd -r appgroup && useradd -r -g appgroup appuser COPY package*.json ./ RUN npm ci --only=production COPY . . RUN chown -R appuser:appgroup /app USER appuserEXPOSE 3000 CMD ["node" , "dist/index.js" ]
3.7 健康检查 添加健康检查,确保容器正常运行。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 FROM node:18 -slimWORKDIR /app COPY package*.json ./ RUN npm ci --only=production COPY . . HEALTHCHECK --interval=30s --timeout =3s --start-period=5s --retries=3 \ CMD node healthcheck.js || exit 1 EXPOSE 3000 CMD ["node" , "dist/index.js" ]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 const http = require ('http' );const options = { hostname : 'localhost' , port : 3000 , path : '/health' , method : 'GET' , timeout : 2000 }; const req = http.request (options, (res ) => { if (res.statusCode === 200 ) { process.exit (0 ); } else { process.exit (1 ); } }); req.on ('error' , () => { process.exit (1 ); }); req.on ('timeout' , () => { req.destroy (); process.exit (1 ); }); req.end ();
四、镜像优化技巧 4.1 镜像体积优化对比 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 graph TB A[优化策略] --> B[选择小基础镜像] A --> C[多阶段构建] A --> D[减少层数] A --> E[清理缓存] A --> F[压缩文件] B --> B1[Alpine: -90%<br/>Distroless: -95%] C --> C1[构建与运行分离<br/>-70%] D --> D1[合并 RUN 指令<br/>-10%] E --> E1[apt/yum/npm 缓存<br/>-20%] F --> F1[压缩静态资源<br/>-30%] G[优化前:1.2GB] --> H[优化后:150MB] H --> I[减少 87.5%]
4.2 完整优化示例 优化前 (1.2GB):
1 2 3 4 5 6 7 FROM node:18 WORKDIR /app COPY . . RUN npm install RUN npm run build EXPOSE 3000 CMD ["node" , "dist/index.js" ]
优化后 (150MB):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 FROM node:18 -alpine AS builderWORKDIR /app RUN apk add --no-cache python3 make g++ COPY package*.json ./ RUN npm ci COPY . . RUN npm run build FROM node:18 -alpineWORKDIR /app RUN addgroup -g 1001 -S nodejs && \ adduser -S nodejs -u 1001 COPY package*.json ./ RUN npm ci --only=production && npm cache clean --force COPY --from=builder /app/dist ./dist RUN chown -R nodejs:nodejs /app USER nodejsHEALTHCHECK --interval=30s --timeout =3s --start-period=5s --retries=3 \ CMD node -e "require('http').get('http://localhost:3000/health', r => process.exit(r.statusCode === 200 ? 0 : 1))" EXPOSE 3000 RUN apk add --no-cache tini ENTRYPOINT ["/sbin/tini" , "--" ] CMD ["node" , "dist/index.js" ]
4.3 镜像扫描 使用工具扫描镜像漏洞:
1 2 3 4 5 6 7 8 9 10 11 docker scout cves myapp:1.0 trivy image myapp:1.0 grype myapp:1.0 clair-scanner -c clair.yaml myapp:1.0
五、Docker 网络 5.1 网络类型 1 2 3 4 5 6 7 8 9 10 11 12 graph TB A[Docker 网络] --> B[Bridge 网络] A --> C[Host 网络] A --> D[None 网络] A --> E[Overlay 网络] A --> F[Macvlan 网络] B --> B1[默认网络<br/>NAT 转发<br/>容器间通信] C --> C1[使用宿主机网络<br/>性能最好<br/>端口冲突风险] D --> D1[无网络<br/>完全隔离] E --> E1[跨主机通信<br/>Swarm/K8s 使用] F --> F1[直接连接物理网络<br/>获取外部 IP]
5.2 自定义网络 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 docker network create --driver bridge my-network docker run -d --name app1 --network my-network myapp:1.0 docker run -d --name app2 --network my-network myapp:1.0 docker network ls docker network inspect my-network docker network connect my-network container1 docker network disconnect my-network container1
5.3 端口映射 1 2 3 4 5 6 7 8 9 10 11 docker run -p 3000:3000 myapp:1.0 docker run -p 127.0.0.1:3000:3000 myapp:1.0 docker run -p 3000:3000 -p 3001:3001 myapp:1.0 docker run -P myapp:1.0
六、Docker 存储 6.1 存储类型对比 1 2 3 4 5 6 7 8 9 10 11 12 graph TB A[Docker 存储] --> B[数据卷 Volume] A --> C[绑定挂载 Bind Mount] A --> D[临时文件系统 tmpfs] B --> B1[Docker 管理<br/>跨平台<br/>备份迁移方便] C --> C1[宿主机路径<br/>高性能<br/>平台相关] D --> D1[内存存储<br/>速度快<br/>重启丢失] B --> E[适用场景:<br/>数据库持久化<br/>配置文件<br/>共享数据] C --> F[适用场景:<br/>开发环境<br/>日志文件<br/>配置文件] D --> G[适用场景:<br/>临时缓存<br/>敏感数据<br/>高性能需求]
6.2 数据卷使用 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 docker volume create my-volume docker run -d --name db \ -v my-volume:/var/lib/mysql \ mysql:8.0 docker volume ls docker volume inspect my-volume docker run --rm \ -v my-volume:/data \ -v $(pwd ):/backup \ alpine tar czf /backup/backup.tar.gz -C /data . docker run --rm \ -v my-volume:/data \ -v $(pwd ):/backup \ alpine tar xzf /backup/backup.tar.gz -C /data
6.3 绑定挂载 1 2 3 4 5 6 7 8 9 10 docker run -d --name app \ -v $(pwd ):/app \ -v /app/node_modules \ myapp:1.0 docker run -d --name app \ -v $(pwd )/config:/app/config:ro \ myapp:1.0
七、Docker Compose 7.1 多容器应用编排 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 version: '3.8' services: app: build: context: . dockerfile: Dockerfile ports: - "3000:3000" environment: - NODE_ENV=production - DATABASE_URL=postgresql://user:pass@db:5432/mydb - REDIS_URL=redis://redis:6379 depends_on: db: condition: service_healthy redis: condition: service_healthy networks: - app-network restart: unless-stopped healthcheck: test: ["CMD" , "node" , "healthcheck.js" ] interval: 30s timeout: 3s retries: 3 db: image: postgres:15-alpine volumes: - postgres-data:/var/lib/postgresql/data - ./init.sql:/docker-entrypoint-initdb.d/init.sql environment: - POSTGRES_USER=user - POSTGRES_PASSWORD=pass - POSTGRES_DB=mydb networks: - app-network restart: unless-stopped healthcheck: test: ["CMD-SHELL" , "pg_isready -U user -d mydb" ] interval: 10s timeout: 5s retries: 5 redis: image: redis:7-alpine command: redis-server --appendonly yes volumes: - redis-data:/data networks: - app-network restart: unless-stopped healthcheck: test: ["CMD" , "redis-cli" , "ping" ] interval: 10s timeout: 3s retries: 3 nginx: image: nginx:alpine ports: - "80:80" - "443:443" volumes: - ./nginx.conf:/etc/nginx/nginx.conf:ro - ./ssl:/etc/nginx/ssl:ro depends_on: - app networks: - app-network restart: unless-stopped networks: app-network: driver: bridge volumes: postgres-data: redis-data:
7.2 常用命令 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 docker-compose up -d docker-compose logs -f docker-compose logs -f app docker-compose down docker-compose down -v docker-compose up -d --build docker-compose ps docker-compose exec app bash docker-compose restart app docker-compose up -d --scale app=3
八、安全加固 8.1 安全最佳实践 1 2 3 4 5 6 7 8 9 10 graph TB A[Docker 安全] --> B[镜像安全] A --> C[容器安全] A --> D[网络安全] A --> E[运行时安全] B --> B1[使用官方镜像<br/>定期更新<br/>漏洞扫描] C --> C1[非 root 用户<br/>只读文件系统<br/>限制能力] D --> D1[网络隔离<br/>最小端口暴露<br/>TLS 加密] E --> E1[资源限制<br/>日志收集<br/>监控告警]
8.2 容器安全配置 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 FROM alpine:3.19 RUN addgroup -S appgroup && adduser -S appuser -G appgroup USER appuserREADONLY
8.3 资源限制 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 docker run -d --cpus="1.5" myapp:1.0 docker run -d --cpu-shares=512 myapp:1.0 docker run -d --cpuset-cpus="0,1" myapp:1.0 docker run -d --memory="512m" myapp:1.0 docker run -d --memory="512m" --memory-swap="1g" myapp:1.0 docker run -d --memory-reservation="256m" myapp:1.0 docker run -d \ --name myapp \ --cpus="1" \ --memory="512m" \ --memory-swap="1g" \ --pids-limit=100 \ --read-only \ --tmpfs /tmp:rw,noexec,nosuid,size=64m \ myapp:1.0
8.4 Docker Compose 安全配置 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 version: '3.8' services: app: image: myapp:1.0 deploy: resources: limits: cpus: '1' memory: 512M reservations: cpus: '0.5' memory: 256M security_opt: - no -new-privileges:true cap_drop: - ALL cap_add: - NET_BIND_SERVICE read_only: true tmpfs: - /tmp:rw,noexec,nosuid,size=64m user: "1000:1000"
九、实战案例 案例 1:Node.js 微服务容器化 背景 :某电商平台的用户服务需要容器化部署,要求镜像小、启动快、安全性高。
项目结构 :
1 2 3 4 5 6 7 8 9 10 11 12 user-service/ ├── src/ │ ├── index.js │ ├── routes/ │ ├── models/ │ └── utils/ ├── tests/ ├── package.json ├── Dockerfile ├── docker-compose.yml ├── .dockerignore └── README.md
Dockerfile :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 FROM node:18 -alpine AS builderWORKDIR /app RUN apk add --no-cache python3 make g++ COPY package*.json ./ RUN npm ci COPY . . RUN npm test RUN npm run build FROM node:18 -alpineLABEL maintainer="john@example.com" LABEL version="1.0.0" LABEL description="User Service for E-commerce Platform" WORKDIR /app RUN addgroup -g 1001 -S nodejs && \ adduser -S nodejs -u 1001 COPY package*.json ./ RUN npm ci --only=production && \ npm cache clean --force COPY --from=builder /app/dist ./dist RUN chown -R nodejs:nodejs /app USER nodejsEXPOSE 3000 HEALTHCHECK --interval=30s --timeout =3s --start-period=5s --retries=3 \ CMD node -e "require('http').get('http://localhost:3000/health', r => process.exit(r.statusCode === 200 ? 0 : 1))" RUN apk add --no-cache tini ENTRYPOINT ["/sbin/tini" , "--" ] CMD ["node" , "dist/index.js" ]
docker-compose.yml :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 version: '3.8' services: user-service: build: context: . dockerfile: Dockerfile ports: - "3000:3000" environment: - NODE_ENV=production - PORT=3000 - DATABASE_URL=postgresql://user:pass@postgres:5432/users - REDIS_URL=redis://redis:6379 - JWT_SECRET=${JWT_SECRET} depends_on: postgres: condition: service_healthy redis: condition: service_healthy networks: - backend restart: unless-stopped deploy: resources: limits: cpus: '1' memory: 512M postgres: image: postgres:15-alpine volumes: - postgres-data:/var/lib/postgresql/data environment: - POSTGRES_USER=user - POSTGRES_PASSWORD=pass - POSTGRES_DB=users networks: - backend restart: unless-stopped healthcheck: test: ["CMD-SHELL" , "pg_isready -U user" ] interval: 10s timeout: 5s retries: 5 redis: image: redis:7-alpine command: redis-server --appendonly yes volumes: - redis-data:/data networks: - backend restart: unless-stopped healthcheck: test: ["CMD" , "redis-cli" , "ping" ] interval: 10s timeout: 3s retries: 3 networks: backend: driver: bridge volumes: postgres-data: redis-data:
性能对比 :
指标
优化前
优化后
提升
镜像体积
1.2GB
150MB
87.5% ↓
构建时间
8 分钟
2 分钟
75% ↓
启动时间
15 秒
3 秒
80% ↓
安全漏洞
23 个
2 个
91% ↓
案例 2:Python 数据分析服务容器化 背景 :某数据分析平台需要将 Python 数据科学应用容器化,涉及大量科学计算库。
挑战 :
科学计算库体积大(NumPy、Pandas、Scikit-learn 等)
需要 GPU 支持(CUDA)
数据持久化需求
多环境配置(开发/生产)
Dockerfile(多环境) :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 FROM python:3.11 -slim AS baseENV PYTHONDONTWRITEBYTECODE=1 \ PYTHONUNBUFFERED=1 \ PIP_NO_CACHE_DIR=1 \ PIP_DISABLE_PIP_VERSION_CHECK=1 WORKDIR /app RUN apt-get update && apt-get install -y --no-install-recommends \ build-essential \ curl \ git \ && rm -rf /var/lib/apt/lists/* RUN groupadd -r datauser && useradd -r -g datauser datauser FROM base AS developmentRUN apt-get update && apt-get install -y --no-install-recommends \ vim \ htop \ && rm -rf /var/lib/apt/lists/* COPY requirements-dev.txt ./ RUN pip install --no-cache-dir -r requirements-dev.txt COPY . . USER datauserEXPOSE 8888 CMD ["jupyter" , "lab" , "--ip=0.0.0.0" , "--port=8888" , "--no-browser" , "--allow-root" ] FROM base AS productionCOPY requirements-prod.txt ./ RUN pip install --no-cache-dir -r requirements-prod.txt COPY --chown =datauser:datauser . . USER datauserHEALTHCHECK --interval=30s --timeout =10s --start-period=5s --retries=3 \ CMD curl -f http://localhost:8000/health || exit 1 EXPOSE 8000 CMD ["uvicorn" , "main:app" , "--host" , "0.0.0.0" , "--port" , "8000" ] FROM nvidia/cuda:12.0 -base-ubuntu22.04 AS gpuENV PYTHONDONTWRITEBYTECODE=1 \ PYTHONUNBUFFERED=1 \ PIP_NO_CACHE_DIR=1 WORKDIR /app RUN apt-get update && apt-get install -y --no-install-recommends \ python3.11 \ python3.11-dev \ python3-pip \ && rm -rf /var/lib/apt/lists/* RUN pip3 install --no-cache-dir \ torch \ torchvision \ torchaudio \ --index-url https://download.pytorch.org/whl/cu120 COPY requirements-gpu.txt ./ RUN pip3 install --no-cache-dir -r requirements-gpu.txt COPY . . EXPOSE 8000 CMD ["python" , "train.py" ]
requirements-prod.txt :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 # Web 框架 fastapi==0.109.0 uvicorn[standard]==0.27.0 # 数据处理 pandas==2.1.4 numpy==1.26.3 # 机器学习 scikit-learn==1.4.0 xgboost==2.0.3 # 数据库 sqlalchemy==2.0.25 psycopg2-binary==2.9.9 # 缓存 redis==5.0.1 # 监控 prometheus-client==0.19.0 # 工具 python-dotenv==1.0.0 pydantic==2.5.3
docker-compose.yml :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 version: '3.8' services: dev: build: context: . target: development ports: - "8888:8888" volumes: - .:/app - dev-data:/home/datauser/.data environment: - PYTHON_ENV=development networks: - data-network profiles: - development prod: build: context: . target: production ports: - "8000:8000" environment: - PYTHON_ENV=production - DATABASE_URL=postgresql://user:pass@postgres:5432/analytics - REDIS_URL=redis://redis:6379 depends_on: - postgres - redis networks: - data-network restart: unless-stopped deploy: resources: limits: cpus: '2' memory: 2G profiles: - production gpu-train: build: context: . target: gpu volumes: - model-data:/app/models - data-storage:/data environment: - CUDA_VISIBLE_DEVICES=0 networks: - data-network deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu ] profiles: - gpu postgres: image: postgres:15-alpine volumes: - postgres-data:/var/lib/postgresql/data environment: - POSTGRES_USER=user - POSTGRES_PASSWORD=pass - POSTGRES_DB=analytics networks: - data-network redis: image: redis:7-alpine volumes: - redis-data:/data networks: - data-network networks: data-network: driver: bridge volumes: dev-data: postgres-data: redis-data: model-data: data-storage:
使用方式 :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 docker-compose --profile development up -d docker-compose --profile production up -d docker-compose --profile gpu up -d
十、CI/CD 集成 10.1 GitHub Actions 示例 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 name: Docker Build and Push on: push: branches: [main , develop ] tags: ['v*.*.*' ] pull_request: branches: [main ] env: REGISTRY: ghcr.io IMAGE_NAME: ${{ github.repository }} jobs: build: runs-on: ubuntu-latest permissions: contents: read packages: write steps: - name: Checkout repository uses: actions/checkout@v4 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 - name: Log in to Container Registry uses: docker/login-action@v3 with: registry: ${{ env.REGISTRY }} username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Extract metadata id: meta uses: docker/metadata-action@v5 with: images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} tags: | type=ref,event=branch type=ref,event=pr type=semver,pattern={{version}} type=semver,pattern={{major}}.{{minor}} type=sha - name: Build and push uses: docker/build-push-action@v5 with: context: . push: true tags: ${{ steps.meta.outputs.tags }} labels: ${{ steps.meta.outputs.labels }} cache-from: type=gha cache-to: type=gha,mode=max platforms: linux/amd64,linux/arm64 - name: Run Trivy vulnerability scanner uses: aquasecurity/trivy-action@master with: image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} format: 'sarif' output: 'trivy-results.sarif' - name: Upload Trivy scan results uses: github/codeql-action/upload-sarif@v2 with: sarif_file: 'trivy-results.sarif'
10.2 GitLab CI 示例 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 stages: - test - build - scan - deploy variables: DOCKER_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA DOCKER_TLS_CERTDIR: "/certs" test: stage: test image: node:18-alpine script: - npm ci - npm test rules: - if: $CI_PIPELINE_SOURCE == "merge_request_event" build: stage: build image: docker:24 services: - docker:24-dind before_script: - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY script: - docker build -t $DOCKER_IMAGE . - docker push $DOCKER_IMAGE rules: - if: $CI_COMMIT_BRANCH == "main" scan: stage: scan image: aquasec/trivy:latest script: - trivy image --exit-code 0 --format sarif --output trivy-results.sarif $DOCKER_IMAGE artifacts: reports: container_scanning: trivy-results.sarif rules: - if: $CI_COMMIT_BRANCH == "main" deploy: stage: deploy image: bitnami/kubectl:latest script: - kubectl set image deployment/user-service user-service=$DOCKER_IMAGE environment: name: production url: https://api.example.com rules: - if: $CI_COMMIT_BRANCH == "main" when: manual
十一、监控与日志 11.1 日志收集 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 version: '3.8' services: app: image: myapp:1.0 logging: driver: "json-file" options: max-size: "10m" max-file: "3" labels: - "logging=true" elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0 environment: - discovery.type=single-node - xpack.security.enabled=false volumes: - es-data:/usr/share/elasticsearch/data networks: - logging logstash: image: docker.elastic.co/logstash/logstash:8.11.0 volumes: - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf:ro networks: - logging depends_on: - elasticsearch kibana: image: docker.elastic.co/kibana/kibana:8.11.0 ports: - "5601:5601" networks: - logging depends_on: - elasticsearch fluentd: image: fluent/fluentd:v1.16 volumes: - ./fluentd.conf:/fluentd/etc/fluent.conf:ro networks: - logging networks: logging: driver: bridge volumes: es-data:
11.2 Prometheus 监控 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 global: scrape_interval: 15s scrape_configs: - job_name: 'docker' static_configs: - targets: ['host.docker.internal:9323' ] - job_name: 'cadvisor' static_configs: - targets: ['cadvisor:8080' ] - job_name: 'node-exporter' static_configs: - targets: ['node-exporter:9100' ]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 docker run -d \ --name=cadvisor \ --volume=/:/rootfs:ro \ --volume=/var/run:/var/run:ro \ --volume=/sys:/sys:ro \ --volume=/var/lib/docker/:/var/lib/docker:ro \ --volume=/dev/disk/:/dev/disk:ro \ --publish=8080:8080 \ gcr.io/cadvisor/cadvisor:latest docker run -d \ --name=node-exporter \ --volume="/proc:/host/proc:ro" \ --volume="/sys:/host/sys:ro" \ --volume="/:/rootfs:ro" \ --publish=9100:9100 \ prom/node-exporter:latest
十二、总结 核心要点回顾 ✅ Dockerfile 优化 :多阶段构建、层缓存、小基础镜像 ✅ 镜像安全 :非 root 用户、漏洞扫描、最小权限 ✅ 网络存储 :自定义网络、数据卷持久化 ✅ 资源限制 :CPU/内存限制、防止资源耗尽 ✅ 健康检查 :确保容器正常运行 ✅ 日志监控 :统一收集、集中管理 ✅ CI/CD 集成 :自动化构建、测试、部署
检查清单 在生产部署前,请确认:
最后更新 : 2026-03-12
标签 : #Docker #容器化 #DevOps #微服务 #CI/CD
分类 : DevOps/容器化
参考资料 :