Consul-sync-catalog failover, multiple replicas

Hi

I deployed the consul helm in my k8s with external master. I just want consul in the nodes and k8s to consul sync.
Everything is ok, but i just have one pod of consul-sync-catalog running and when the node were it is running dies, i lose all the sync services in consul and get a overflow of erros in several apps until the pod restarts and (slowly) re-add the missing synced services

the helm makes no references to consul-sync-catalog replicas count and the template itself have the replica: 1 hardcoded

Is there any problem of increasing this deployment replica to 3, so i always have the k8s services synced to consul? or is there other way of doing this failsafe

Thanks in advance for the help

1 Like

So i found a workaround for this, as a single point of failure for consul-sync-catalog is not acceptable

In my helm, i did a copy the values.yaml to consul-sync-failover, removed/disabled everything that wasn’t related to sync-catalog and added this 2 configs:

global:
  name: consul-failover

syncCatalog:
consulNodeName: "k8s-sync-failover"

then deploy a new helm, with this values
The original helm will create the normal consul helm, with everything i need, the second hem will just deploy a new consul-sync-catalog with a new “virtual” consul-sync node name, that is doing the same as the original consul-sync-catalog

So each consul-sync-catalog service that is running is updating consul services, but as they are using different virtual node names, they don’t conflict with each other (like trying to increase the deployment pod count from 1 to 2). If one of them disappear, on startup it will still remove everything related to his virtual node name, but will not touch the other one. As each service can have multiple nodes, most of the time we have 2 hosts per services, but if one consul-sync-catalog dies, the other keeps maintaining the services up in the other host

So TLDR:

Deploy 2 helms, one normal with catalog-sync, another with just catalog-sync and a different name and consulNodeName and both deployments of catalog-sync should be now redundant and work in parallel without collisions