005手动创建一个一定会发生故障推出的pod并跟踪这个pod
部署一个mytest的 Deployment 副本数量为10,之后模拟一次发版导致了失败,我们用Readiness来保证不健康的pod不被请求
1. 先准备两个Deployment配置
```yaml
# cat myapp-v1.yaml 是可以通过健康检查
apiVersion: apps/v1
kind: Deployment
metadata:
name: mytest
spec:
replicas: 10 # 这里准备10个数量的pod
selector:
matchLabels:
app: mytest
template:
metadata:
labels:
app: mytest
spec:
containers:
- name: mytest
image: registry.cn-hangzhou.aliyuncs.com/acs/busybox:v1.29.2
args:
- /bin/sh
- -c
- sleep 10; touch /tmp/healthy; sleep 30000
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 10
periodSeconds: 5
# cat myapp-v2.yaml v2是不能通过检测的 模拟升级发版失败
apiVersion: apps/v1
kind: Deployment
metadata:
name: mytest
spec:
strategy:
rollingUpdate:
maxSurge: 35% # 滚动更新的副本总数最大值(以10的基数为例):10 + 10 * 35% = 13.5 --> 14
maxUnavailable: 35% # 可用副本数最大值(默认值两个都是25%): 10 - 10 * 35% = 6.5 --> 7
replicas: 10
selector:
matchLabels:
app: mytest
template:
metadata:
labels:
app: mytest
spec:
containers:
- name: mytest
image: registry.cn-hangzhou.aliyuncs.com/acs/busybox:v1.29.2
args:
- /bin/sh
- -c
- sleep 30000 # 可见这里并没有生成/tmp/healthy这个文件,所以下面的检测必然失败
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 10
periodSeconds: 5
```
2. 启动myapp-v1.yaml
```shell
kubectl apply -f myapp-v1.yaml
# 别忘了加备注
kubectl annotate deployment/mytest kubernetes.io/change-cause="kubectl apply --filename=myapp-v1.yaml"
# 过一会就会看到pod状态为Running
root@k8s-192-168-0-17:~# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mytest-59887f89f5-fq6hv 1/1 Running 0 112s 172.20.182.159 k8s-192-168-0-11 <none> <none>
mytest-59887f89f5-gpsnx 1/1 Running 0 113s 172.20.182.157 k8s-192-168-0-11 <none> <none>
mytest-59887f89f5-gwkmg 1/1 Running 0 113s 172.20.177.33 k8s-192-168-0-19 <none> <none>
mytest-59887f89f5-ltdw9 1/1 Running 0 115s 172.20.182.156 k8s-192-168-0-11 <none> <none>
mytest-59887f89f5-m4vkn 1/1 Running 0 112s 172.20.177.37 k8s-192-168-0-19 <none> <none>
mytest-59887f89f5-m9z2t 1/1 Running 0 112s 172.20.182.160 k8s-192-168-0-11 <none> <none>
mytest-59887f89f5-mq9n6 1/1 Running 0 113s 172.20.177.35 k8s-192-168-0-19 <none> <none>
mytest-59887f89f5-nwsc9 1/1 Running 0 115s 172.20.177.34 k8s-192-168-0-19 <none> <none>
mytest-59887f89f5-pzm68 1/1 Running 0 115s 172.20.177.36 k8s-192-168-0-19 <none> <none>
mytest-59887f89f5-qd74c 1/1 Running 0 113s 172.20.182.158 k8s-192-168-0-11 <none> <none>
```
3. 启动myapp-v2.yaml
```shell
kubectl apply -f myapp-v2.yaml
# 别忘了加备注
kubectl annotate deployment/mytest kubernetes.io/change-cause="kubectl apply --filename=myapp-v2.yaml"
# 过一会查看deployment 输出结果 会稳定在以下结果
root@k8s-192-168-0-17:~# kubectl get deployment mytest
NAME READY UP-TO-DATE AVAILABLE AGE
mytest 7/10 7 7 3m43s
# READY 现在正在运行的只有7个pod
# UP-TO-DATE 表示当前已经完成更新的副本数:即 7 个新副本
# AVAILABLE 表示当前处于 READY 状态的副本数
# 查看pod
root@k8s-192-168-0-17:~# kubectl get pod
NAME READY STATUS RESTARTS AGE
mytest-59887f89f5-fq6hv 1/1 Running 0 5m9s
mytest-59887f89f5-gpsnx 1/1 Running 0 5m10s
mytest-59887f89f5-gwkmg 1/1 Running 0 5m10s
mytest-59887f89f5-ltdw9 1/1 Running 0 5m12s
mytest-59887f89f5-m9z2t 1/1 Running 0 5m9s
mytest-59887f89f5-pzm68 1/1 Running 0 5m12s
mytest-59887f89f5-qd74c 1/1 Running 0 5m10s
mytest-8586c6547d-6sqwt 0/1 Running 0 2m19s
mytest-8586c6547d-b9kql 0/1 Running 0 2m20s
mytest-8586c6547d-cgkrj 0/1 Running 0 2m7s
mytest-8586c6547d-dw6kv 0/1 Running 0 2m18s
mytest-8586c6547d-ht4dq 0/1 Running 0 2m19s
mytest-8586c6547d-v7rh9 0/1 Running 0 2m8s
mytest-8586c6547d-vqn6w 0/1 Running 0 2m7s
# 查看deployment的信息
root@k8s-192-168-0-17:~# kubectl describe deployment mytest
...
Replicas: 10 desired | 7 updated | 14 total | 7 available | 7 unavailable
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 5m46s deployment-controller Scaled up replica set mytest-59887f89f5 from 0 to 10
Normal ScalingReplicaSet 2m52s deployment-controller Scaled up replica set mytest-8586c6547d from 0 to 4
Normal ScalingReplicaSet 2m50s deployment-controller Scaled down replica set mytest-59887f89f5 from 10 to 7
Normal ScalingReplicaSet 2m45s deployment-controller Scaled up replica set mytest-8586c6547d from 4 to 7
```
4. 如此我们保证了集群中有7个可用的pod
下面来解析一下整个过程
maxSurge:
规定了滚动更新过程中pod副本数可以超过总副本数的上限。配置项可以是一个具体的数字也可以是一个比例,如果是比例则会向上取整
我们的例子副本总数是10 maxSurge: 35% 所以 10 + 10 * 35% = 13.5 --> 14
所以对mytest这个deployment的副本描述Replicas: 10 desired | 7 updated | 14 total | 7 available | 7 unavailable
10个目标值 7个已经更新 14为最大值 7个可用 7个不可用
maxUnavailable:
控制滚动过程中最大pod不可用数量。同样可以是一个数字也可以是一个比例。如果是比例则向上取整
我们例子中 maxUnavailable:35% 所以 10 - 10 * 35% = 6.5 --> 7
我们本次滚动更新的完整过程为
1) 根据maxSurge得到最大副本数14 所以 先创建4个新版本的pod副本,使副本总数达到14
2) 然后根据maxUnavailable 的到最大不可用数量为7 14-7(最大不可用数)=7(最小可用数) 所以销毁3个旧版本的pod
3) 3个旧版本pod销毁完成之后,再创建3个新版本pod使总副本数保持14
4) 当新版本pod通过Readiness检测后,会使可用pod副本超过7个
5) 再销毁更多旧pod使可用副本保持7个。
6) 随着旧pod销毁,新pod会自动创建,使副本数保持14
7) 依此类推一直到全部更新完成。
我们的实际情况在第4步卡住了。新的pod无法通过Readiness的检测。
此时在实际生产环境中我们需要rollout undo 来回滚上一个版本保证集群整体
```shell
root@k8s-192-168-0-17:~# kubectl rollout history deployment mytest
deployment.apps/mytest
REVISION CHANGE-CAUSE
1 kubectl apply --filename=myapp-v1.yaml
2 kubectl apply --filename=myapp-v2.yaml
root@k8s-192-168-0-17:~# kubectl rollout undo deployment mytest --to-revision=1
deployment.apps/mytest rolled back
# 然后 观察全局pod的变化过程
kubectl get pod -w
```