Production deployment - questions around guides discrepancies

I’ve been reading/studying the official helm chart and the official cloudformation template to deploy consul on k8s and AWS and I can’t wrap my head around few discrepancies between the 2 deployment models.

The helm chart runs consul using a statefulset that retains the identity of the node (IP - dataDisk), the cftemplate doesn’t care about neither of those.

The helm chart generates agent client certificates instead of leveraging the auto_encrypt feature that consul have to distribute certificates to the agents.

The 2 deployment strategies obviously work but I wonder what are the recommendation around retaining data disk, agents certs, consul servers IP addresses.

What are the drawbacks of using one or the other?

1 Like

I guess the reason to use the StatefulSet API in Kubernetes didn’t come from the requirement for a stable network identity, but rather from the requirement that Kubernetes creates a new PersistentVolumeClaim for every pod, something that’s not possible when using a Deployment.

hello @tongpu that totally makes sense but do not definitely answer my questions:

  • should I retain EBS disks? they do in k8s but not on AWS cftemplate
  • should I retain consul IPs? they do in k8s but not in the AWS cftemplate
  • should I manually distribute agent certs? they do in k8s helm not in the AWS cftemplate

I just would like to get what are the overall best practices/recommendations that i coudn’t find anywhere plus the official installations methods differs from each other.

Thanks Lukas, that’s correct. We’re using the statefulset for the PVC that gives us a persistent disk for the Consul servers.

The Helm chart is not using auto encrypt right now because of a bug in auto encrypt that cause the certificates to not have the right IP SAN. This meant we had to generate the certificates ourselves with the IP SAN. This bug was fixed in Consul 1.7.0 and we are now working on auto encrypt support.

I recently answered a question about Consul servers and durable storage here: Do Consul servers need durable storage?. Consul server IPs can change without issue as long as the join urls are still routable.

Thank you @lkysow now i have a better understanding of the requirements for a successful production deployment.

A part for the bug in previous version for auto encrypt… auto encrypt feature should be the right way to go, right? When I will be able to use it in k8s?

I’m now using a self signed CA but I’d like to get Vault in the future… from what I’ve read if I use auto encrypt should be easier to migrate to Vault isn’t it?
Today I’m also using pre-generated ACL tokens i’d like to offload those as well to Vault.

Do you have any documentation on how to migrate auto encrypt and tokens to Vault ?

@lkysow any update on the CA fix?
For my understanding if you don’t use the auto_encrypt then will not easily be able to rotate CA cert if you need to or to upgrade to Vault.

Hi Luigi,
The auto encrypt fixes were released in 1.7.1 but we’re still working on the Helm chart to support auto encrypt.

Hello @lkysow thank you the reply is there an issue/PR i can follow? this is really important for me.
Is the current Helm setup with no auto_encrypt able to support cert rotation/Ca offload to vault? with no downtime?

I’ve created Support auto-encrypt · Issue #373 · hashicorp/consul-helm · GitHub to track.

Is the current Helm setup with no auto_encrypt able to support cert rotation/Ca offload to vault? with no downtime?

Vault isn’t supported right now although there might be a way to make it work if you fork the chart and use the vault sidecar. As written, the CA cert is generated and saved as Kubernetes secrets. When the pods restart, they regenerate their certs from the CA so certificate rotation is supported via pod restart. For rotating the CA, we haven’t gotten to this yet.

Hello Luke thank you creating the issue so that i can watch that.

About the helm chart… my setup is a bit different:

  • I’m running consul on EC2 with the internalCA using self-signed keypair.
  • i’ve forked the helm chart and kept only the agent parts… i’m injecting the the CA key/certs via k8s secrets… and using the current initContainer to create a cert for the agent.

Based on what i just described:

  • is there a way I can rotate consul CA and agent cert with no downtime?
  • Is there a way with no downtime to move the CA from internal to vault? (planning vault integration for next quarter)

Thank a lot for the support! Really appreciated!

Are you talking about the Connect CA or the Server CA?

is there a way I can rotate consul CA and agent cert with no downtime?

You can pass -ca-path to Consul to have it allow for multiple CAs so you could probably create your secondary CA, make sure all the components have that, then change to the secondary. I haven’t tested this though.

Is there a way with no downtime to move the CA from internal to vault? (planning vault integration for next quarter)

You should be able to put the same CA into vault as you’re using right now. So if that works then there would be no downtime to switch.

I assume you’re referring to the built-in Connect CA. If so, the high level process for migrating from one CA provider to another is documented in Connect Certificate Management: Root Certificate Rotation.

Hope this helps.

thank you @blake and @lkysow i’ve definitely mixed up the CA here…
after reading more carefully the doc i can see that there are:
"ca_file", "cert_file", "key_file" for HTTPS api
and
connect -> ca_config -> consul/vault for connect?

For the moment my question is about the agent… if I want to rotate the CA used for HTTPS how can i do it for both agents and servers with no downtime?

Would be possible to use the internal CA for both HTTPS and mtls in connect?