Question about "helm upgrading" a consul cluster

Hello!!

I would like to discuss about the proper way to upgrade a running consul cluster.
In more detail, in our product, Consul is deployed as a Daemonset. There are 3 pods in total and a Consul server is running on each of them.
Then we need to upgrade to a newer version of this helm chart - the consul version itself doesn’t necessarily change in the context of this helm upgrade. Most of the time it’s the consul docker image that changes because the base image is updated due to security fixes etc.

During the “helm upgrade” and although the 3 consul pods are sequentially upgraded ( the Daemonset’s updateStrategy is set to rollingUpdate), the quorum is lost and it takes around 4-5 minutes until a new leader is elected. The new leader is elected when the new/upgraded pods start.
This causes several issues to the applications that need to access consul’s KV store during this timeframe.

So, now I’m trying to figure out what is the proper way to upgrade the consul pods without losing the quorum and the cluster leader.
I’ve read the instructions about upgrading the consul version in a running Consul cluster.
I think that the fact that we simply “helm upgrade” our chart and we do not follow any of the steps described in the aforementioned link (e.g. leave the upgrade/re-creation of the consul leader’s pod for last) is the reason we lose the quorum and the cluster leader for some minutes.
But how could we overcome this?
Has anyone else probably faced a similar issue?

Thank you,

Evi

Hello!

First off, welcome to the consul community! :confetti_ball:

For helm upgrades ( like what you’re attempting ) we have a ‘Upgrading Consul on Kubernetes’ doc linked here that outlines some upgrade considerations specific to helm and the steps that are needed for helm upgrade to work without any downtime.

I would recommend reading through that doc and attempting the upgrade steps there. If any issues occur with that method, feel free to come back and let us know :slight_smile:

1 Like

Thank you very much for the quick response, Amier! :slight_smile:

I read the document and I understood that, according to the instructions, the Consul servers should first of all be deployed as a StatefulSet.
Unfortunately, in our case, the Consul servers are deployed as a Daemonset - it is a pretty old design (~5 years), so I think we should reconsider this and change to a StatefulSet now, so as to take advantage of the rollingUpdate: partition option during the upgrade.

We’ll check it in more detail and get back to you if I have more questions.

Thanks again for the support,
–Evi