I’m trying to understand how to handle clustered services with Consul Connect, while being able to use rolling updates. I tried with MongoDB (3 instances replica set) and Postgres (with patroni, 1 leader + 2 followers), but this will most likely affect any other clustered service (redis, ES, rabbitmq etc.).
First solution is not to use Consul Connect and just expose ports on the nomad clients. I can create a single group with a count of 3, each with one task. No issue with rolling updates. But security (like TLS) must be provided by the service itself as Connect is not used. There’s also no easy way to restrict which other services can reach my clustered service (because no intentions are available). So I must ensure every service is really highly secured natively (and this push complexity to every clients of this service like distributing client certificate etc.)
Second solution is to try to use Consul Connect. But, there’s no way I can see to allow the various instances of a task to talk to each others (because we have to manually define the upstreams with hardcoded local_bind_port). So we can’t use a single group with count=3. We also can’t define a single group with 3 tasks, because they would all be scheduled on the same node (which would defeat the purpose to have a highly available service). So we have to define 3 groups, each with 1 task. With this solution, we can define consul connect correctly, each task having 2 upstreams pointing at the other 2 instances. It’s a lot of duplicated definition, a lot of intentions (task-1 → task-2, task-1 → task-3, task-2 → task-1 etc.). But it’s working network wise. The problem with this approach is that we can’t use rolling updates : all groups are handled in parallel. Only tasks in the same group are subject to the rolling update policy defined in the job.
Last solution would be similar de the second one, except we define 3 jobs, each with 1 group, each with one task. It has the same problem as solution 2 (lot of duplicated definition, lots of intentions), plus a messier list of jobs now. We can handle rolling update, but only mannually, by submiting the 3 jobs separatly. This would work but seems very akward
None of those possible solutions seems the right one, they all have serious drawbacks (either security or manageability downsides). Anyone else struggles with this kind of workloads ? Is there an obvious thing I’m missing ?