Consul issue: part of wrong data center

Failed to join ip-addr Member ‘member-name’ part of wrong datacenter ‘us-central1’

we are getting above error continuosly and not able to join to consul server, all our servers and clients are on VM’s

server version : 1.7.4
client version : 1.4.0

Can you share your configuration file?

Hi all,

let my hijack the inconclusive thread with the evocative subject.

I’m getting:

2021-03-31T08:10:34.685Z [ERROR] agent.server.memberlist.lan: memberlist: Failed push/pull merge: Member ‘sit-itaag108-corporatesha01-eesb-consul-01’ part of wrong datacenter ‘sit-itaag108-corporatesha01-eesb-consul-01’ from=172.17.0.4:55718

where the consul config reads:

{
  "acl": {
    "default_policy": "deny",
    "enable_token_persistence" : true,
    "enabled": true,
    "tokens": {
      "master": "{{consul_secret}}",
      "agent": "{{consul_secret}}"
    }
  },
  "addresses": {
    "http": "0.0.0.0"
  },
  "bootstrap_expect": 1,
  "start_join": [
    "0.0.0.0"
  ],
  "ports": {
    "dns": -1,
    "serf_wan": -1
  },
  "data_dir": "/consul/data",
  "datacenter": "{{instance_name}}",
  "node_name": "{{instance_name}}",
  "server": true,
  "telemetry": {
    "disable_compat_1.9": true
  }
}

I’m not sure how it leads the consul server to the conclusion that it’s in a wrong datacenter.

Any clue, please?

Best regards
Cc.

Hi @pekuz,

Just curious why you have the same variable for datacenter and node_name :thinking:. If you roll out this config into a cluster, every node will be configured to be in its own data center. This would explain why you are getting the wrong datacenter error.

Hi Ranjandas,

it’s intentional, note also bootstrap_expect = 1. Configuring the datacenter = node_name intents to prevent forming a cluster > 1. The options documentation does not prohibit the datacenter = {{ instance_name }}, node_name = {{ instance_name }} and bootstrap_expect = 1 combination.

I still do not understand how consul can err if an actual data center in run-time equals to the configured data center. The error message reads as if it was an interpolation of: [ERROR] agent.server.memberlist.lan: memberlist: Failed push/pull merge: Member ‘{{ options.node_name}}’ part of wrong datacenter ‘{{ options.datacenter }}’ from=172.17.0.4:55718

Best regards
Cc.

1 Like

I don’t think I am understanding the question fully here.

I tried the equivalent of the above using the below combination and I don’t see any error as you described.

$ consul agent -server -bootstrap-expect 1 -node dc1 -datacenter dc1 -bind '{{ GetInterfaceIP "en0" }}' -data-dir /tmp/consul

$ consul members
Node  Address           Status  Type    Build      Protocol  DC   Segment
dc1   192.168.1.8:8301  alive   server  1.9.4+ent  2         dc1  <all>

I even tried with the exact configuration you shared, still was not able to reproduce the error.

Could you let me know what am I doing wrong here in reproducing the issue you are facing?

Hi Ranjandas,

thank you for investigating, I could speculate on differences:

  • our persisted data differ and there might be a persisted state that makes the memberslist push/pull merge to err, (I could share data_dir zip if needed) or

  • at my side the 1.9.4 consul image is Docker-hosted, or

  • actually I have more docker containers of the 1.9.4 image on the same VM, each with different {{ instance_name }} and each mounted with a private data_dir.

Hope it helps
Cc.

Hi @pekuz,

Could you share the client agent config, please? I was able to reproduce this error with the following two steps.

  • I - Start the Server Agent

    $ consul agent -dev -bind "{{ GetInterfaceIP \"enp0s2\" }}" -datacenter dc-0
    
  • II - Start the Client Agent

    # 192.168.64.86 is the server IP.
    $ consul agent -retry-join 192.168.64.86 -data-dir /tmp/consul
    

In this case, I got the following error when the agent was trying to join the server.

[ERROR] agent.server.memberlist.lan: memberlist: Failed push/pull merge: Member 'c-srv-1' part of wrong datacenter 'dc1' from=192.168.64.87:42126

In this case, it’s because the client agent was trying to join with the default datacenter name of dc1.

I feel it could be similar to this in your case.

It would be great if you could also share consul members from the server and also the final rendered config from the client agent.

I can’t think of any other reason why you would get this error :frowning_face:

If your docker setup is using docker-compose, sharing the compose file would be also helpful to reproduce the issue.

sit-itaag108-corporatehmb03-eesb-consul-01.zip.txt (281.9 KB)

Hi Ranjandas,

subset of docker inspect follows, namely the Env object that passes whole configuration:

(I took another Consul instance on the same host)

docker inspect sit-itaag108-corporatehmb03-eesb-consul-01

....
    "Mounts": [
        {
            "Type": "bind",
            "Source": "/opt/eesb/sit-itaag108-corporatehmb03-eesb-consul-01/data",
            "Destination": "/consul/data",
            "Mode": "rw",
            "RW": true,
            "Propagation": "rprivate"
        }
    ],
    "Config": {
        "Hostname": "8578d0a6e894",
        "Domainname": "",
        "User": "",
        "AttachStdin": false,
        "AttachStdout": true,
        "AttachStderr": true,
        "ExposedPorts": {
            "8300/tcp": {},
            "8301/tcp": {},
            "8301/udp": {},
            "8302/tcp": {},
            "8302/udp": {},
            "8500/tcp": {},
            "8600/tcp": {},
            "8600/udp": {}
        },
        "Tty": false,
        "OpenStdin": false,
        "StdinOnce": false,
        "Env": [
            "CONSUL_LOCAL_CONFIG={\n  \"acl\": {\n    \"default_policy\": \"deny\",\n    \"enable_token_persistence\" : true,\n    \"enabled\": true,\n    \"tokens\": {\n      \"master\": \"?\",\n      \"agent\": \"aecaa9f9-6e85-ee6e-7e17-b1b3a7254e39\"\n    }\n  },\n  \"addresses\": {\n    \"http\": \"0.0.0.0\"\n  },\n  \"bootstrap_expect\": 1,\n  \"start_join\": [\n    \"0.0.0.0\"\n  ],\n  \"ports\": {\n    \"dns\": -1,\n    \"serf_wan\": -1\n  },\n  \"data_dir\": \"/consul/data\",\n  \"datacenter\": \"sit-itaag108-corporatehmb03-eesb-consul-01\",\n  \"node_name\": \"sit-itaag108-corporatehmb03-eesb-consul-01\",\n  \"ui_config\": {\n    \"enabled\": false\n  },\n  \"server\": true,\n  \"telemetry\": {\n    \"disable_compat_1.9\": true\n  }\n}",
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "HASHICORP_RELEASES=https://releases.hashicorp.com"
        ],
        "Cmd": [
            "agent",
            "-server"
        ],
        "Image": "docker.packages.eurofins.local/consul:1.9.4",
        "Volumes": {
            "/consul/data": {}
        },
        "WorkingDir": "",
        "Entrypoint": [
            "docker-entrypoint.sh"
        ],
        "OnBuild": null,
        "Labels": {
            "org.opencontainers.image.authors": "Consul Team <consul@hashicorp.com>",
            "org.opencontainers.image.version": "1.9.4"
        }
    },
    "NetworkSettings": {
        "Bridge": "",
        "SandboxID": "2a99040a325e1694c7614e84d99e2561c0e850baca59a2d0b32f9ec972310565",
        "HairpinMode": false,
        "LinkLocalIPv6Address": "",
        "LinkLocalIPv6PrefixLen": 0,
        "Ports": {
            "8300/tcp": null,
            "8301/tcp": null,
            "8301/udp": null,
            "8302/tcp": null,
            "8302/udp": null,
            "8500/tcp": [
                {
                    "HostIp": "0.0.0.0",
                    "HostPort": "8639"
                }
            ],
            "8600/tcp": null,
            "8600/udp": null
        },
        "SandboxKey": "/var/run/docker/netns/2a99040a325e",
        "SecondaryIPAddresses": null,
        "SecondaryIPv6Addresses": null,
        "EndpointID": "f1311b635394c501c74959f7a939ed45c0c6d5c8f242a446bc95e1452a53f8e4",
        "Gateway": "172.17.0.1",
        "GlobalIPv6Address": "",
        "GlobalIPv6PrefixLen": 0,
        "IPAddress": "172.17.0.2",
        "IPPrefixLen": 16,
        "IPv6Gateway": "",
        "MacAddress": "02:42:ac:11:00:02",
        "Networks": {
            "bridge": {
                "IPAMConfig": null,
                "Links": null,
                "Aliases": null,
                "NetworkID": "f45362d1f66a6e466d0404f37dc30c96b58ef35974989603c5cba24d719a30b5",
                "EndpointID": "f1311b635394c501c74959f7a939ed45c0c6d5c8f242a446bc95e1452a53f8e4",
                "Gateway": "172.17.0.1",
                "IPAddress": "172.17.0.2",
                "IPPrefixLen": 16,
                "IPv6Gateway": "",
                "GlobalIPv6Address": "",
                "GlobalIPv6PrefixLen": 0,
                "MacAddress": "02:42:ac:11:00:02",
                "DriverOpts": null
            }
        }
    }
}

logs extract:

docker logs sit-itaag108-corporatehmb03-eesb-consul-01

2021-03-31T08:10:24.155Z [WARN]  agent: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
2021-03-31T08:10:24.155Z [WARN]  agent: bootstrap = true: do not enable unless necessary
2021-03-31T08:10:24.161Z [WARN]  agent.auto_config: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
2021-03-31T08:10:24.161Z [WARN]  agent.auto_config: bootstrap = true: do not enable unless necessary
2021-03-31T08:10:24.170Z [INFO]  agent.server.raft: restored from snapshot: id=32-32768-1616894873285
2021-03-31T08:10:24.194Z [INFO]  agent.server.raft: initial configuration: index=29516 servers="[{Suffrage:Voter ID:d39c0ac1-6bff-e6bd-4dff-01f6a6153da7 Address:172.17.0.2:8300}]"
2021-03-31T08:10:24.195Z [INFO]  agent.server.raft: entering follower state: follower="Node at 172.17.0.2:8300 [Follower]" leader=
2021-03-31T08:10:24.195Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: sit-itaag108-corporatehmb03-eesb-consul-01 172.17.0.2
2021-03-31T08:10:24.195Z [INFO]  agent.router: Initializing LAN area manager
2021-03-31T08:10:24.195Z [WARN]  agent.server.serf.lan: serf: Failed to re-join any previously known node
2021-03-31T08:10:24.195Z [INFO]  agent.server: Adding LAN server: server="sit-itaag108-corporatehmb03-eesb-consul-01 (Addr: tcp/172.17.0.2:8300) (DC: sit-itaag108-corporatehmb03-eesb-consul-01)"
2021-03-31T08:10:24.196Z [INFO]  agent: Starting server: address=[::]:8500 network=tcp protocol=http
==> Joining cluster...
2021-03-31T08:10:24.196Z [INFO]  agent: (LAN) joining: lan_addresses=[0.0.0.0]
2021-03-31T08:10:24.197Z [INFO]  agent: (LAN) joined: number_of_nodes=1
2021-03-31T08:10:24.197Z [INFO]  agent: Join completed. Initial agents synced with: agent_count=1
2021-03-31T08:10:24.197Z [INFO]  agent: started state syncer
==> Consul agent running!
2021-03-31T08:10:31.535Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2021-03-31T08:10:31.711Z [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader=
2021-03-31T08:10:31.711Z [INFO]  agent.server.raft: entering candidate state: node="Node at 172.17.0.2:8300 [Candidate]" term=36
2021-03-31T08:10:31.714Z [INFO]  agent.server.raft: election won: tally=1
2021-03-31T08:10:31.715Z [INFO]  agent.server.raft: entering leader state: leader="Node at 172.17.0.2:8300 [Leader]"
2021-03-31T08:10:31.715Z [INFO]  agent.server: cluster leadership acquired
2021-03-31T08:10:31.715Z [INFO]  agent.server: New leader elected: payload=sit-itaag108-corporatehmb03-eesb-consul-01
2021-03-31T08:10:31.752Z [INFO]  agent.server: initializing acls
2021-03-31T08:10:31.752Z [INFO]  agent.leader: started routine: routine="legacy ACL token upgrade"
2021-03-31T08:10:31.752Z [INFO]  agent.leader: started routine: routine="acl token reaping"
2021-03-31T08:10:31.753Z [INFO]  agent.server.serf.lan: serf: EventMemberUpdate: sit-itaag108-corporatehmb03-eesb-consul-01
2021-03-31T08:10:31.753Z [INFO]  agent.server: Updating LAN server: server="sit-itaag108-corporatehmb03-eesb-consul-01 (Addr: tcp/172.17.0.2:8300) (DC: sit-itaag108-corporatehmb03-eesb-consul-01)"
2021-03-31T08:10:31.754Z [INFO]  agent.leader: started routine: routine="federation state anti-entropy"
2021-03-31T08:10:31.754Z [INFO]  agent.leader: started routine: routine="federation state pruning"
2021-03-31T08:10:34.605Z [INFO]  agent: Synced node info
2021-03-31T08:10:34.685Z [ERROR] agent.server.memberlist.lan: memberlist: Failed push/pull merge: Member 'sit-itaag108-corporatesha01-eesb-consul-01' part of wrong datacenter 'sit-itaag108-corporatesha01-eesb-consul-01' from=172.17.0.4:55718
2021-03-31T12:39:59.436Z [ERROR] agent.http: Request error: method=GET url=/v1/config from=10.99.148.243:45718 error="method GET not allowed"
2021-04-02T05:20:46.941Z [ERROR] agent.server.memberlist.lan: memberlist: Failed push/pull merge: Member 'sit-itaag108-corporatesha01-eesb-consul-01' part of wrong datacenter 'sit-itaag108-corporatesha01-eesb-consul-01' from=172.17.0.4:43564

for the member list:

[eesbadmin@eu50mqvt011 eesb ]$ curl -H "X-Consul-Token: ?" http://localhost:8639/v1/agent/members
Your IP is issuing too many concurrent connections, please rate limit your calls

ah, it smells too…

likely some client leaks HTTP connections and Consul does not kill inancitive connections or so… restarting consul.

[eesbadmin@eu50mqvt011 eesb ]$ curl -H "X-Consul-Token: ?" http://localhost:8639/v1/agent/members
[{"Name":"sit-itaag108-corporatehmb03-eesb-consul-01","Addr":"172.17.0.2","Port":8301,"Tags":{"acls":"1","bootstrap":"1","build":"1.9.4:10bb6cb3","dc":"sit-itaag108-corporatehmb03-eesb-consul-01","ft_fs":"1","ft_si":"1","id":"d39c0ac1-6bff-e6bd-4dff-01f6a6153da7","port":"8300","raft_vsn":"3","role":"consul","segment":"","vsn":"2","vsn_max":"3","vsn_min":"2"},"Status":1,"ProtocolMin":1,"ProtocolMax":5,"ProtocolCur":2,"DelegateMin":2,"DelegateMax":5,"DelegateCur":4}]

Finally attaching the data folder zip (renamed to .txt).

Hope it helps
Cc.

Hi @pekuz,

Thank you for sharing this. I had a look at the contents of the data-dir you shared, but I could only find just one host in the Serf member list.

Could you share the following:

  • ENV block from docker inspect of sit-itaag108-corporatesha01-eesb-consul-01?
  • full docker inspect of sit-itaag108-corporatesha01-eesb-consul-01 and sit-itaag108-corporatehmb03-eesb-consul-01
  • Output of consul members from sit-itaag108-corporatesha01-eesb-consul-01

I think I am almost out of ideas here, unfortunately. If you could share the above, I will try my last luck. :slight_smile:

Hi Ranjandas,

yes the issue is a hard one, the datacenter verification logic fails in some (yet unknown) case or the error message is misleading.

The members:

curl -H "X-Consul-Token: ?" http://localhost:8640/v1/agent/members
[{"Name":"sit-itaag108-corporatesha01-eesb-consul-01","Addr":"172.17.0.4","Port":8301,"Tags":{"acls":"1","bootstrap":"1","build":"1.9.4:10bb6cb3","dc":"sit-itaag108-corporatesha01-eesb-consul-01","ft_fs":"1","ft_si":"1","id":"cef7cdec-30b2-4538-2b64-f1b3af03e278","port":"8300","raft_vsn":"3","role":"consul","segment":"","vsn":"2","vsn_max":"3","vsn_min":"2"},"Status":1,"ProtocolMin":1,"ProtocolMax":5,"ProtocolCur":2,"DelegateMin":2,"DelegateMax":5,"DelegateCur":4}]

All inspect details attached:
eu50mqvt011-docker-inspect-consuls.txt (21.4 KB)

Hope it helps
Cc.