概述
Deployment 是 Kubernetes 中最常用的工作负载资源,用于声明式地管理 Pod 和 ReplicaSet,提供滚动更新、回滚、扩缩容等功能。
核心功能:
- 🔄 声明式更新和回滚
- 📊 副本数量管理
- 🚀 滚动更新策略
- 💚 健康检查和自愈
- 🎯 资源配额管理
适用场景:
Deployment 完整配置
基本结构
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| apiVersion: apps/v1 kind: Deployment metadata: name: my-app namespace: default labels: app: my-app spec: replicas: 3 selector: matchLabels: app: my-app strategy: type: RollingUpdate template: metadata: labels: app: my-app spec: containers: - name: app image: nginx:1.21
|
完整配置详解
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116
| apiVersion: apps/v1 kind: Deployment metadata: name: my-application namespace: production labels: app: my-app version: v1 annotations: description: "My Application Deployment"
spec: replicas: 3 selector: matchLabels: app: my-app strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 template: metadata: labels: app: my-app version: v1 annotations: prometheus.io/scrape: "true" spec: containers: - name: app image: nginx:1.21 imagePullPolicy: Always args: - "--config=/etc/app/config.yaml" env: - name: ENV value: "production" ports: - name: http containerPort: 8080 protocol: TCP - name: metrics containerPort: 9090 protocol: TCP resources: limits: cpu: "1000m" memory: "2Gi" requests: cpu: "500m" memory: "1Gi" readinessProbe: httpGet: path: /health port: 8080 scheme: HTTP initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 3 livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 60 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 lifecycle: preStop: exec: command: - /bin/sh - -c - "sleep 15" volumeMounts: - name: config mountPath: /etc/app restartPolicy: Always terminationGracePeriodSeconds: 30 dnsPolicy: ClusterFirst volumes: - name: config configMap: name: app-config
|
配置详解
关键区别:
| 属性 | 描述对象 | 必需字段 | 用途 |
|---|
| metadata | Deployment 自身 | name | 标识 Deployment 资源 |
| template.metadata | 将要创建的 Pod | labels | 标识和选择 Pod |
示例对比:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| metadata: name: my-deployment namespace: default labels: tier: backend
spec: selector: matchLabels: app: my-app
template: metadata: labels: app: my-app version: v1
|
命名规则:
1 2 3 4 5 6 7
| metadata: name: my-app
|
注意事项:
| 规则 | 说明 |
|---|
| ✅ metadata.name 必需 | Deployment 必须有名称 |
| ✅ template.metadata.labels 必需 | 用于 selector 匹配 |
| ❌ template.metadata.name 通常不设置 | Pod 名称自动生成 |
| ⚠️ Job 等资源例外 | Job 的 Pod 需要显式设置名称 |
更新策略
滚动更新(RollingUpdate):
1 2 3 4 5
| strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0
|
参数说明:
| 参数 | 说明 | 示例 | 效果 |
|---|
| maxSurge | 最多额外创建的 Pod | 1 或 25% | 更新时最多 4 个 Pod |
| maxUnavailable | 最多不可用的 Pod | 0 或 25% | 保证至少 3 个 Pod 可用 |
更新过程示例(replicas=3):
1 2 3 4 5 6 7
| 初始状态:[Pod1] [Pod2] [Pod3]
maxSurge=1, maxUnavailable=0: Step 1: [Pod1] [Pod2] [Pod3] [Pod4-new] ← 创建新 Pod Step 2: [Pod2] [Pod3] [Pod4-new] [Pod5-new] ← 删除旧 Pod Step 3: [Pod3] [Pod4-new] [Pod5-new] [Pod6-new] Step 4: [Pod4-new] [Pod5-new] [Pod6-new] ← 完成
|
重建策略(Recreate):
1 2
| strategy: type: Recreate
|
对比:
1 2
| 滚动更新:[Old] → [Old+New] → [New] ✅ 无停机 重建: [Old] → [空] → [New] ❌ 有停机
|
镜像拉取策略
策略对比:
| 策略 | 说明 | 适用场景 |
|---|
| Always | 每次都拉取 | 生产环境(确保最新) |
| IfNotPresent | 本地有则不拉取 | 开发环境(节省时间) |
| Never | 从不拉取 | 私有镜像(已预加载) |
默认行为:
1 2 3 4 5
| image: nginx:latest → imagePullPolicy: Always
image: nginx:1.21 → imagePullPolicy: IfNotPresent
|
资源配置
1 2 3 4 5 6 7
| resources: limits: cpu: "1000m" memory: "2Gi" requests: cpu: "500m" memory: "1Gi"
|
单位说明:
| 资源 | 单位 | 换算 |
|---|
| CPU | m(millicore) | 1 核 = 1000m |
| 内存 | Mi, Gi | 1Gi = 1024Mi = 1024×1024Ki |
配置建议:
1 2 3 4 5 6 7 8 9 10 11
| resources: requests: cpu: "500m" memory: "1Gi" limits: cpu: "2000m" memory: "2Gi"
|
健康检查
就绪探针(Readiness Probe):
1 2 3 4 5 6 7 8 9 10
| readinessProbe: httpGet: path: /health port: 8080 scheme: HTTP initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 3
|
探针类型对比:
| 探针类型 | 用途 | 失败后果 |
|---|
| readinessProbe | 是否准备好接收流量 | 从 Service 移除,不重启 |
| livenessProbe | 是否存活 | Pod 重启 |
| startupProbe | 是否已启动 | 禁用其他探针,失败则重启 |
检查方式对比:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| httpGet: path: /health port: 8080
tcpSocket: port: 3306
exec: command: - cat - /tmp/healthy
|
参数说明:
| 参数 | 说明 | 推荐值 |
|---|
| initialDelaySeconds | 容器启动后多久开始检查 | 30-60s |
| periodSeconds | 检查间隔 | 10s |
| timeoutSeconds | 超时时间 | 5s |
| successThreshold | 成功几次视为健康 | 1 |
| failureThreshold | 失败几次视为不健康 | 3 |
生命周期钩子
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| lifecycle: preStop: exec: command: - /bin/sh - -c - "sleep 15 && curl -X POST http://localhost:8080/shutdown" postStart: exec: command: - /bin/sh - -c - "echo 'Started' > /tmp/started"
|
优雅关闭流程:
1 2 3 4 5 6
| 1. Pod 收到 TERM 信号 2. 执行 preStop 钩子 3. 等待 preStop 完成 4. 发送 SIGTERM 给容器 5. 等待 terminationGracePeriodSeconds 6. 如果还未停止,发送 SIGKILL
|
最佳实践:
1 2 3 4 5 6 7 8 9 10
| lifecycle: preStop: exec: command: - /bin/sh - -c - "sleep 15"
terminationGracePeriodSeconds: 30
|
DNS 策略
策略对比:
| 策略 | 说明 | 使用场景 |
|---|
| ClusterFirst | 优先集群 DNS(默认) | 大部分场景 |
| Default | 使用节点 DNS | 特殊网络配置 |
| None | 不配置 DNS | 完全自定义 |
| ClusterFirstWithHostNet | HostNetwork + 集群 DNS | hostNetwork=true 时 |
重启策略
策略对比:
| 策略 | 行为 | 适用资源 |
|---|
| Always | 总是重启(默认) | Deployment, StatefulSet |
| OnFailure | 失败时重启 | Job |
| Never | 从不重启 | Job(一次性) |
实战示例
生产环境完整配置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
| apiVersion: apps/v1 kind: Deployment metadata: name: production-app namespace: production labels: app: myapp tier: backend env: production annotations: description: "Production application deployment" contact: "team@example.com"
spec: replicas: 3 selector: matchLabels: app: myapp tier: backend strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 template: metadata: labels: app: myapp tier: backend version: v1.2.0 annotations: prometheus.io/scrape: "true" prometheus.io/port: "9090" spec: containers: - name: app image: registry.example.com/myapp:1.2.0 imagePullPolicy: Always ports: - name: http containerPort: 8080 protocol: TCP - name: metrics containerPort: 9090 protocol: TCP env: - name: JAVA_OPTS value: "-Xmx1g -Xms1g" - name: ENV value: "production" - name: DB_HOST valueFrom: secretKeyRef: name: db-secret key: host resources: requests: cpu: "500m" memory: "1Gi" limits: memory: "2Gi" readinessProbe: httpGet: path: /actuator/health/readiness port: 8080 initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 livenessProbe: httpGet: path: /actuator/health/liveness port: 8080 initialDelaySeconds: 60 periodSeconds: 10 failureThreshold: 3 lifecycle: preStop: exec: command: - /bin/sh - -c - "sleep 15" volumeMounts: - name: config mountPath: /app/config - name: logs mountPath: /app/logs restartPolicy: Always terminationGracePeriodSeconds: 30 dnsPolicy: ClusterFirst volumes: - name: config configMap: name: app-config - name: logs emptyDir: {}
|
常用操作命令
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
| kubectl apply -f deployment.yaml
kubectl get deployment kubectl get deploy -o wide
kubectl describe deployment my-app
kubectl get pods -l app=my-app
kubectl scale deployment my-app --replicas=5
kubectl set image deployment/my-app app=nginx:1.22
kubectl rollout status deployment/my-app
kubectl rollout history deployment/my-app
kubectl rollout undo deployment/my-app kubectl rollout undo deployment/my-app --to-revision=2
kubectl rollout pause deployment/my-app kubectl rollout resume deployment/my-app
kubectl delete deployment my-app
|
最佳实践
资源配置建议
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| resources: requests: cpu: "500m" memory: "1Gi" limits: memory: "2Gi"
resources: limits: cpu: "1000m" memory: "10Gi"
|
健康检查建议
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| readinessProbe: httpGet: path: /health/readiness port: 8080 initialDelaySeconds: 30 periodSeconds: 10 failureThreshold: 3
livenessProbe: httpGet: path: /health/liveness port: 8080 initialDelaySeconds: 60 periodSeconds: 10 failureThreshold: 3
|
标签管理建议
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| metadata: labels: app: myapp version: v1.2.0 tier: backend env: production team: platform cost-center: engineering
template: metadata: labels: app: myapp version: v1.2.0 tier: backend
|
故障排查
常见问题
问题1:Pod 一直处于 Pending
1 2 3 4 5 6 7
| kubectl describe pod <pod-name>
|
问题2:Pod 频繁重启
1 2 3 4 5 6 7 8
| kubectl logs <pod-name> kubectl logs <pod-name> --previous
|
问题3:更新失败
1 2 3 4 5 6 7 8
| kubectl rollout status deployment/my-app
kubectl describe deployment my-app
kubectl rollout undo deployment/my-app
|
总结
Deployment 核心要点:
| 方面 | 要点 | 建议 |
|---|
| 元数据 | metadata vs template.metadata | 理解两者差异 |
| 更新策略 | 滚动更新 vs 重建 | 通常用滚动更新 |
| 资源配置 | requests + limits | 设置合理值 |
| 健康检查 | readiness + liveness | 独立端点 |
| 优雅关闭 | preStop + terminationGrace | 处理流量切换 |
快速开始:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| apiVersion: apps/v1 kind: Deployment metadata: name: nginx spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.21 ports: - containerPort: 80
|
参考资料: