重启策略

策略说明适用对象
Always总是重启Deployment/RS(持续运行)
OnFailure失败时重启Job/CronJob
Never从不重启Job/CronJob

重启延迟: 10s → 20s → 40s → 80s → 160s → 300s(最大)

三种探针

Startup Probe(启动探针)

作用: 判断容器是否已启动

1
2
3
4
5
6
7
8
startupProbe:
httpGet:
path: /doc.html
port: 40017
initialDelaySeconds: 10
failureThreshold: 10
periodSeconds: 5
# 最多50秒启动时间(10次×5秒)

Liveness Probe(存活探针)

作用: 检查容器是否需要重启(失败则杀死重启)

1
2
3
4
5
6
livenessProbe:
httpGet:
path: /doc.html
port: 40017
failureThreshold: 1
periodSeconds: 10

Readiness Probe(就绪探针)

作用: 检查服务是否就绪(失败则从Service剔除)

1
2
3
4
5
6
7
readinessProbe:
httpGet:
path: /doc.html
port: 40017
initialDelaySeconds: 10
failureThreshold: 3
periodSeconds: 5

三种检测方式

1. exec(命令检测)

1
2
3
4
5
6
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
# 返回0=健康

2. httpGet(HTTP检测)

1
2
3
4
5
6
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
# 返回200-399=健康

3. tcpSocket(TCP检测)

1
2
3
4
livenessProbe:
tcpSocket:
port: 8080
# TCP连接成功=健康

配置参数

1
2
3
4
5
initialDelaySeconds: 0   # 启动后延迟检测(秒)
periodSeconds: 10 # 检测间隔(秒)
timeoutSeconds: 1 # 超时时间(秒)
successThreshold: 1 # 成功阈值(次)
failureThreshold: 3 # 失败阈值(次)

完整示例(SpringBoot)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
apiVersion: v1
kind: Pod
metadata:
name: springboot-app
spec:
containers:
- name: app
image: springboot:latest
ports:
- containerPort: 8080

# 1. 启动探针(30秒启动时间)
startupProbe:
httpGet:
path: /actuator/health
port: 8080
initialDelaySeconds: 10
failureThreshold: 6
periodSeconds: 5

# 2. 就绪探针
readinessProbe:
httpGet:
path: /actuator/health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5

# 3. 存活探针
livenessProbe:
httpGet:
path: /actuator/health
port: 8080
initialDelaySeconds: 60
periodSeconds: 10

最佳实践

探针配置建议
StartupProbe启动延迟长的应用必须配置
ReadinessProbe所有服务必须配置
LivenessProbe谨慎配置,避免误杀

生产建议: 必须配置 ReadinessProbe,谨慎配置 LivenessProbe