Check your Helm deployments!
The Deployment resource is the de-facto way to handle application deployments in Kubernetes, but there are many tools to manage them. One way to manage them safely is to use kubectl directly as demonstrated in my previous article.
Another popular way to deploy resources to Kubernetes is to use Helm, a package manager for Kubernetes. In this article, I’ll talk about how to repeat the deployment pattern demonstrated in the previous post using Helm. We’ll be using Helm version 2.14.2 for the demonstration.
Example Helm chart
In Helm, Kubernetes resources are distributed as charts: a collection of templated Kubernetes resources in YAML or JSON format. The charts can be deployed from an external Helm repository, a chart archive file, or a local chart directory. Each chart has its own set of variables that can be used for customising the deployment. Let’s generate a Helm chart to a local directory that we can use for testing failing and successful deployments.
helm create demo
This creates a simple chart in the directory demo/
, which contains a deployment for a web server. The template in path demo/templates/deployment.yaml
generates the deployment manifest. Let’s parametrise the readiness probe so that we can simulate a failing deployment by changing a Helm chart parameter.
readinessProbe:
httpGet:
path: {{ .Values.readinessPath | default "/" }}
port: http
Unchecked deployment
There are two ways to install Helm charts using the Helm CLI: helm install
and helm upgrade --install
. The install sub-command always installs a brand new chart, while the upgrade sub-command can upgrade an existing chart and install a new one if the chart hasn’t been installed before. With the upgrade feature, we can use a single command for installs and upgrades, which is handy for automation. Let’s use it to install the demo Helm chart we created earlier.
$ helm upgrade --install demo demo/
Release "demo" does not exist. Installing it now.
NAME: demo
LAST DEPLOYED: Fri Aug 16 13:48:06 2019
NAMESPACE: default
STATUS: DEPLOYED
RESOURCES:
==> v1/Deployment
NAME READY UP-TO-DATE AVAILABLE AGE
demo 0/1 1 0 1s
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
demo-69c7467798-84nr9 0/1 ContainerCreating 0 1s
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
demo ClusterIP 10.96.196.73 <none> 80/TCP 1s
As we can see from the output, the Helm chart was installed, but the deployment is still in progress. Helm didn’t check that our deployment finished successfully.
When we create a failing deployment, we should see the same result. Let’s break the deployment on purpose by changing the path of the readiness probe to something that we know doesn’t work.
$ helm upgrade --install --set readinessPath=/fail demo demo/
Release "demo" does not exist. Installing it now.
NAME: demo
LAST DEPLOYED: Fri Aug 16 13:53:26 2019
NAMESPACE: default
STATUS: DEPLOYED
RESOURCES:
==> v1/Deployment
NAME READY UP-TO-DATE AVAILABLE AGE
demo 0/1 1 0 5m
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
demo-54df8f97bb-ffp4b 0/1 ContainerCreating 0 0s
demo-69c7467798-84nr9 1/1 Running 0 5m
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
demo ClusterIP 10.96.196.73 <none> 80/TCP 5m
The output shows that the chart was "deployed", but the updated Pod wasn’t launched successfully. We can verify that the deployment didn’t finish successfully by viewing the deployment rollout status.
$ kubectl rollout status deployment demo
Waiting for deployment "demo" rollout to finish: 1 old replicas are pending termination...
However, the chart deployment history will show that the first deployment was superseded by the second one.
$ helm history demo
REVISION STATUS DESCRIPTION
1 SUPERSEDED Install complete
2 DEPLOYED Upgrade complete
Let’s delete the chart, and start fresh.
helm delete --purge demo
Wait and timeout
It seems that the Helm chart deployments work similarly to how kubectl apply
works: the resources are created, but the actual deployment is not verified. With kubectl, we can use kubectl rollout status
to further check the status of the deployment. So what would be the Helm equivalent in this case?
Helm install and upgrade commands include two CLI options to assist in checking the deployments: --wait
and --timeout
. When using --wait
, Helm will wait until a minimum expected number of Pods in the deployment are launched before marking the release as successful. Helm will wait as long as what is set with --timeout
. By default, the timeout is set to 300 seconds. Let’s try it out!
$ helm upgrade --install --wait --timeout 20 demo demo/
Release "demo" does not exist. Installing it now.
NAME: demo
LAST DEPLOYED: Fri Aug 16 15:47:10 2019
NAMESPACE: default
STATUS: DEPLOYED
RESOURCES:
==> v1/Deployment
NAME READY UP-TO-DATE AVAILABLE AGE
demo 1/1 1 1 8s
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
demo-69c7467798-4tkqf 1/1 Running 0 8s
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
demo ClusterIP 10.102.127.162 <none> 80/TCP 8s
The deployment finished successfully as expected. Let’s see what happens when we try this with a failing deployment.
$ helm upgrade --install --wait --timeout 20 --set readinessPath=/fail demo demo/
UPGRADE FAILED
Error: timed out waiting for the condition
Error: UPGRADE FAILED: timed out waiting for the condition
Very nice! We finally got feedback from a failing deployment. We can also see this in the Helm chart history.
$ helm history demo
REVISION STATUS DESCRIPTION
1 DEPLOYED Install complete
2 FAILED Upgrade "demo" failed: timed out waiting for the condition
Manual rollbacks
Now that we have a failed upgrade, you might think that you can just deploy the previous version of the chart and be done with it. Unfortunately, that doesn’t get you back to a working version.
$ helm upgrade --install --wait --timeout 20 demo demo/
UPGRADE FAILED
Error: timed out waiting for the condition
Error: UPGRADE FAILED: timed out waiting for the condition
$ helm history demo
REVISION STATUS DESCRIPTION
1 DEPLOYED Install complete
2 FAILED Upgrade "demo" failed: timed out waiting for the condition
3 FAILED Upgrade "demo" failed: timed out waiting for the condition
I’m not sure why this is the case, but it gets worse! If you try to issue another update for your chart, it will fail as well.
$ helm upgrade --install --wait --timeout 20 --set replicaCount=2 demo demo/
UPGRADE FAILED
Error: timed out waiting for the condition
Error: UPGRADE FAILED: timed out waiting for the condition
The only way I can think of getting out of this situation is to delete the chart deployment entirely and start fresh.
$ helm delete --purge demo
Instead of trying to use the upgrade command, we can use helm rollback
. It’s specifically designed for rolling out a version of a chart you’ve deployed before. To use the rollback sub-command, we need to provide it the revision to roll back to. It also accepts the same wait and timeout options as install and upgrade, which we can use to verify that the rollback itself is successful. Note that rollback can’t be used for recovering from the situation mentioned above. Let’s roll back to the first revision.
$ helm upgrade --install --wait --timeout 20 demo demo/
$ helm upgrade --install --wait --timeout 20 --set readinessPath=/fail demo demo/
$ helm rollback --wait --timeout 20 demo 1
Rollback was a success.
Awesome! Again, this should be visible in the Helm chart history as well.
$ helm history demo
REVISION STATUS DESCRIPTION
1 SUPERSEDED Install complete
2 SUPERSEDED Upgrade "demo" failed: timed out waiting for the condition
3 DEPLOYED Rollback to 1
Automated rollbacks with atomic
Automating deployment and rollback this way is a bit cumbersome because you need to figure out how to parse the last successful revision from the Helm history, so that you can issue a rollback. Using the -o json
option with the history command, you can get the history in JSON format, which should help. However, there is a shortcut to avoid all that.
Helm install and upgrade commands include an --atomic
CLI option, which will cause the chart deployment to automatically roll back when it fails. Enabling the atomic option will automatically enable wait. Let’s try it!
$ helm upgrade --install --atomic --timeout 20 --set readinessPath=/fail demo demo/
UPGRADE FAILED
Error: timed out waiting for the condition
ROLLING BACKRollback was a success.
Error: UPGRADE FAILED: timed out waiting for the condition
$ helm history demo
REVISION STATUS DESCRIPTION
1 SUPERSEDED Install complete
2 SUPERSEDED Upgrade "demo" failed: timed out waiting for the condition
3 SUPERSEDED Rollback to 1
4 SUPERSEDED Upgrade "demo" failed: timed out waiting for the condition
5 DEPLOYED Rollback to 3
Perfect! This will also work then failing to install the chart the first time. If there’s no revision to revert to, the chart deployment will be deleted.
$ helm delete --purge demo
release "demo" deleted
$ helm upgrade --install --atomic --timeout 20 --set readinessPath=/fail demo demo/
Release "demo" does not exist. Installing it now.
INSTALL FAILED
PURGING CHART
Error: release demo failed: timed out waiting for the condition
Successfully purged a chart!
Error: release demo failed: timed out waiting for the condition
$ helm history demo
Error: release: "demo" not found
Awkwardness in Helm
The fact that Helm supports checking deployments and automated rollbacks out of the box is awesome, but it has a couple of caveats compared to traditional kubectl based deployments.
First, there’s no official command to wait for a deployment to finish that’s separate from the install and upgrade procedure similar to kubectl rollout status
. This feature would be useful to have in situations where you suspect a deployment on the same Helm chart might be ongoing: you could wait for the existing deployment to finish before attempting to apply your changes. However, it is possible to work around this caveat by creating a script that continuously polls the status of the chart deployment using the helm status
sub-command.
Second, the deployment timeout is global across all the resources within the chart. Compare this to the progressDeadlineSeconds
feature in Deployment, which allows estimating the timeout per Pod. In Helm, you need to take into account all the pods within a deployment in a single timeout. This makes it much harder to estimate a correct timeout for the chart deployment. If you estimate it too low, you get deployment that fails too early even when it still could make progress. If you estimate it too high, the chart deployment will have to wait a long time to notice that the deployment is not getting anywhere.
Conclusion
In this article, I’ve demonstrated how to safely deploy Helm charts containing Kubernetes Deployments with automated rollbacks. I’ve also talked about the inherent caveats in Helm’s approach for monitoring deployment healthiness. One area I haven’t covered is how safe deployments are handled with Helm charts that contain resources not based on just the Kubernetes Deployment resource.
Thanks for reading!