Consul in multi-tenant K8S installation

Hello,

Some information of our cluster first (in case this can help):

  • Self hosted
  • Currently have 16 nodes (stills growing)
  • Created using rancher, and we have multiple project in it, with a few namespace per projet. Each project uses a separate consul instance in order to fully isolate the workload.
  • We are in a micro service workload

Currently we are using the Consul agent per pod architectures as this was the easiest for us. However as every project have between 10-20 services, with at least two replica. This generate between 20 and 40 consul agents nodes. We see that the documentation here (https://www.consul.io/docs/k8s/installation/overview) mention that we are not in the preferred type of configuration and that it is recommended to use DaemonSet instead.

However this raises a few questions for us:
1 - As we are in a multi-tenant and we want to keep the isolation between all the project workload, if we switch to DaemonSet, we will have a clash of port as each consul cluster will try to bind port 8500 for their agent. I guess we could overwrite the port per project, but is it the advised solution?
2 - We fear that we will end up in the same type of issue later when the cluster is bigger. Currently we would go from 20-40 agent to a static value of 16 (as we got 16 k8s nodes at the moment). But this advantage on resources will quickly go down the more K8S nodes we got.

Do you have any advise on what would be the best practice for our type of installation/workload?

Kr,

Hi, at a high level the answer to your question is that the Helm chart does not support running multiple Consul installations in a single cluster separated by namespaces. This is because Consul is meant to be run across the whole cluster.

Our advice would be to run a single Consul installation in your cluster and use other Consul primitives like tags to separate your projects. Maybe you can expand upon why you need separate Consul installations?

To answer your specific questions:

  1. Yes the ports would clash and you’d need to manually set them. This isn’t supported in the chart and you’d need to fork the chart and there are other components like consul-k8s that would still not work. This goes back to my high-level answer above that really this isn’t supported and so it’s not the advised solution.
  2. We chose to run Consul agents in a Daemonset rather than in each pod for this exact reason. If you’re running a ton of pods then you’re going to be wasting a lot of resources running Consul agents.

Hi,

This is mostly because of some legacy reason in our application.
Our old setup (still being migrated), is like this:

3 Applications servers running on VM in the cloud that actively used consul to register services and access a few path in the kv store (such as /configs, /locks, /tasks). The old cluster didn’t contain any agent, just the servers (to not have too many VM). Each application would contact their local agent (server) on 127.0.0.1:8500

We are now migrating this workload to our K8S installation, and as all the application register with the same name, and access path that are static for their configuration, we assumes that if we go for a single cluster, all will clash, as the services will be mixed between project and the kv path will overlap.

At this time, the applications uses consul for only 3 things:

  1. Read configuration directly from the kv store
  2. Read configuration via consul-template
  3. Register the service (but we only check the presence of the service, not their IP, port, number)

What we are currently wondering right now is, if the extra agent are needed, or if we could just tell our applications to connect directly to the server pods.

However, if I am reading the documentation correctly this isn’t recommended. On the use-case part it mention that it could generate issues with consul-template, but I think I miss where this could generate issue as we could just specify the service name, and it would connect to the node via the K8S service, I can’t find right now any drawback for the use case of consul-template?
Small edit: I checked again and our consul-template connect directly to the server at the moment and not to an agent.

I am also assuming that this could generate weird behavior on the service registration part, as the service in k8s would randomly redirect our application to a node for the registration, and in case of issue this could change from one call to another. I am not too sure of the side-effect and if this is indeed a real issue or not.

We are trying to find a good balance between, the least amount of change in our applications, and good practice for our kind of workload.

Kr,

Registering with the servers will cause the service to be associated with that server pod. If the server pod dies then Consul will assume all the services registered with that pod are also unavailable. If you’re using the service registrations then I think this won’t work for you.

If you’re only using consul-template with k/v values then talking to the Consul servers isn’t terrible with the small workload you have. Clients can help with scaling.

I’d recommend you re-architect your applications to work with a single Consul cluster. Then you can use our Helm chart and all the automation we provide.

Perhaps you could also look at Consul Enterprise which has namespaces. Then you could keep the same behaviour and each service could live in its own namespace.