Question about "helm upgrading" a consul cluster

evilin13 · November 2, 2021, 6:04pm

Hello!!

I would like to discuss about the proper way to upgrade a running consul cluster.
In more detail, in our product, Consul is deployed as a Daemonset. There are 3 pods in total and a Consul server is running on each of them.
Then we need to upgrade to a newer version of this helm chart - the consul version itself doesn’t necessarily change in the context of this helm upgrade. Most of the time it’s the consul docker image that changes because the base image is updated due to security fixes etc.

During the “helm upgrade” and although the 3 consul pods are sequentially upgraded ( the Daemonset’s updateStrategy is set to rollingUpdate), the quorum is lost and it takes around 4-5 minutes until a new leader is elected. The new leader is elected when the new/upgraded pods start.
This causes several issues to the applications that need to access consul’s KV store during this timeframe.

So, now I’m trying to figure out what is the proper way to upgrade the consul pods without losing the quorum and the cluster leader.
I’ve read the instructions about upgrading the consul version in a running Consul cluster.
I think that the fact that we simply “helm upgrade” our chart and we do not follow any of the steps described in the aforementioned link (e.g. leave the upgrade/re-creation of the consul leader’s pod for last) is the reason we lose the quorum and the cluster leader for some minutes.
But how could we overcome this?
Has anyone else probably faced a similar issue?

Thank you,

Evi

Amier · November 2, 2021, 8:10pm

Hello!

First off, welcome to the consul community!

For helm upgrades ( like what you’re attempting ) we have a ‘Upgrading Consul on Kubernetes’ doc linked here that outlines some upgrade considerations specific to helm and the steps that are needed for helm upgrade to work without any downtime.

I would recommend reading through that doc and attempting the upgrade steps there. If any issues occur with that method, feel free to come back and let us know

evilin13 · November 3, 2021, 9:39am

Thank you very much for the quick response, Amier!

I read the document and I understood that, according to the instructions, the Consul servers should first of all be deployed as a StatefulSet.
Unfortunately, in our case, the Consul servers are deployed as a Daemonset - it is a pretty old design (~5 years), so I think we should reconsider this and change to a StatefulSet now, so as to take advantage of the rollingUpdate: partition option during the upgrade.

We’ll check it in more detail and get back to you if I have more questions.

Thanks again for the support,
–Evi

Topic		Replies	Views
Upgrading Consul servers using Helm / GitOps Consul	1	288	September 16, 2022
Automated Consul Server upgrades in Kubernetes Consul	3	379	May 11, 2022
Updating consul helm chart in k8 restarts all consul pods Consul	1	420	February 14, 2022
Upgrade Vault and Consul helm releases Vault	1	506	March 10, 2022
Consul MAJOR Upgrade Consul	1	304	June 9, 2020

Question about "helm upgrading" a consul cluster

Related topics