Vault / consul double registration weirdness on docker

I have a consul cluster with vault deployed to docker swarm. Like this:

version: “3.9”

services:

  consul:
    image: consul:latest
    hostname: consul{{.Task.Slot}}
    environment:
      CONSUL_BIND_INTERFACE: eth1
    ports:
    - 8500:8500
    command: agent -server -bootstrap-expect 3 -ui -client 0.0.0.0 -retry-join consul1 -retry-join consul2 -retry-join consul3
    deploy:
      replicas: 3

  vault:
    image: vault:latest
    command: server
    hostname: vault{{.Task.Slot}}
    environment:
      VAULT_LOCAL_CONFIG: '{"storage": {"consul": {"address": "http://consul:8500" }}, "listener": { "tcp": { "address": "0.0.0.0:8200", "tls_disable": 1 } }, "ui": true, "default_lease_ttl": "168h", "max_lease_ttl": "720h", "disable_mlock": true }'
      VAULT_CLUSTER_INTERFACE: eth0
      SKIP_SETCAP: 1
      VAULT_ADDR: http://localhost:8200
      VAULT_API_ADDR: http://vault{{.Task.Slot}}:8200
    deploy:
      replicas: 2
    ports:
    - 8200:8200

For some reason, after this has been running for a while, the consul dashboard will indicate that there are 3 or more instances of vault - 2 healthy entries, and a number of spare unhealthy entries that have exceeded timeouts - the differentiating factor is that, at some point it seems docker recreated vault:vault2:8200 (as an example) and it connected to a different consul instance, and the old vault:vault2:8200 did not get unregistered.

But now these dead entries are stuck and I don’t know how to get rid of them. And I don’t know if theres something I should do to make my deployment more resilient to these appearing in the first place.

I was wondering what you did to. get this resolved, please, if you’re willing to share your compose file. Thanks

The Consul integration in Vault expects that Vault is configured to talk to a single Consul agent that is co-located on the same host as the Vault instance.

In the configuration shown here, that is not the case - Vault is being pointed at what looks like a DNS round robin address over multiple Consul servers. (I think - I’m not really familiar with Docker Swarm.)

Does Docker Swarm have something similar in concept to Kubernetes pods? If so, this could be addressed by running a Consul client agent as a sidecar to each Vault instance, and having them register with that.