Is there really no automated upgrade strategy to release a change to a Consul Server cluster running in Kubernetes without causing an outage? Are these onerous manual instructions the only way? Upgrade | Consul by HashiCorp
It’s highly unusual for an official Helm chart to not support bumping an image version or changing resource limits as a simple rolling upgrade. The default configuration of the Helm chart (
updatePartitions: 0) means that by default the chart causes an outage for a simple image upgrade of Consul Server (or any other change to the stateful set manifest). Is there really no way to refine the upgrade strategy and readiness probe to achieve automated rolling upgrades with no outage?
The instructions are the safest way to be sure there is not a temporary time when services can’t be registered but otherwise a regular upgrade will roll fine.
But, why are the official instructions so manual?
Was integrating more closely with Kubernetes health checks and allowing Kubernetes to manage the rollout automatically considered and discarded for some reason, or is it just a case that no-one had time to develop and test that?
The official instructions are manual because we wanted to show the safest way.
We do integrate with k8s health checks (we use the /status/leader endpoint), which is why a regular rolling deployment does work, but we see an operator as the best way to fully manage the rollout safely, so that’s what we’re planning on working on in the future.