New consul server cannot join consul cluster

Hello,

New consul server cannot join consul cluster.

Here is my scenario:

We have a 4 node consul server cluster working in production. One of the instances has scheduled maintenance (AWS degraded hardware) and we need to stop/start that instance.
As this is a production cluster, I want to add another node to have the option to have two node failures (with 5 nodes) instead of one (with 4 nodes) while performing the stop/start of the affected node. Consul version is 0.9.3 on all server nodes.

I have created a new instance of consul server and this is the consul command to start the new server (and the same command other nodes use (except bootstrap node):

consul agent -server -advertise=172.25.1.6 -retry-join=172.25.1.4

(exact command is the one bellow:)

docker run -d -v /etc/localtime:/etc/localtime:ro -v $(pwd)/consul-data:/consul/data --restart=unless-stopped --net=host consul:${version} agent -server -advertise=${advertise} -retry-join=${retry-join} -datacenter=${datacenter} -log-level=${log-level} -data-dir=/consul/data

Cluster IP addresses are:

172.25.1.4 (this is the bootstrap server and the one specified in -retry-join)
172.25.1.5
172.25.2.4
172.25.2.5

And the new node is 172.25.1.6

After creation, the new consul server cannot join the cluster.
Here are part of the logs in 172.25.1.6 (new consul server):

  • Failed to join 172.25.1.4: dial tcp 172.25.1.4:8301: i/o timeout
    2020/09/16 18:16:54 [WARN] agent: Join LAN failed: , retrying in 30s
    2020/09/16 18:16:56 [ERR] agent: failed to sync remote state: No cluster leader
    2020/09/16 18:17:02 [ERR] agent: Coordinate update error: No cluster leader
    2020/09/16 18:17:20 [ERR] agent: failed to sync remote state: No cluster leader
    2020/09/16 18:17:24 [INFO] agent: (LAN) joining: [172.25.1.4]
    2020/09/16 18:17:34 [INFO] agent: (LAN) joined: 0 Err: 1 error(s) occurred:

This new server has IP address 172.25.1.6 and the retry-join is to 172.25.1.4, so as you can see 172.25.1.6 cannot reach 172.25.1.4.

From 172.25.1.6, I can connect to 172.25.1.5, but not to 172.25.1.4

(connection to 172.25.1.5 works:)

$ telnet 172.25.1.5 8301
Trying 172.25.1.5...
Connected to 172.25.1.5.
Escape character is '^]'.

(connection to 172.25.1.4, does not work:)

$ telnet 172.25.1.4 8301
Trying 172.25.1.4...

(they are on the same subnet, it can connect to 1.5, should be able to connect to 1.4…)

These 4 nodes have the same security group and have ports TCP 8400, 8500, 8300-8302, and 8600 open to the members of that security group.
UDP ports: 8301-8302 and 8600. (as the new node has the same security group as the other nodes in the cluster, I don’t think there is a problem with a port being blocked)

Also checked NACLS for booth instances (new node and bootstrap node)

I also made a test in a staging environment with a similar configuration and a new node joins the cluster without a problem (also can telnet to the retry-join node specified)

Any idea why the new node can’t connect to the node specified in the retry-join and in consequence cannot join the cluster?
(other servers nodes are already connected to 172.25.1.4…, for example, 172.25.1.5 same subnet as 172.25.1.6, same security group…)

I thought trying another address in the retry-join instead of the 172.25.1.4, as I can telnet other nodes in port 8301, I suppose new node may join those server nodes. What I am concerned it that new node cannot connect to 172.25.1.4, and I don’t know if this could cause cluster misconfiguration.

I suppose it is safe to stop/start the instance that it has scheduled maintenance and have a three node cluster while doing the stop/start of the instance, but I prefer to have another node so that in case another node fails, the cluster doesn’t run out of quorum.

Is it safe to try the retry-join to another node in the cluster instead of the 172.25.1.4, even if the new node cannot connect to the bootstrap node?

Thanks a lot!

Thanks for using Consul, and also for the very detailed message. Like you, I am concerned that, even if using a different server in the retry-join works, you’ll still have issues if you can’t talk to the leader.

Out of curiosity, can you telnet to 172.25.1.4 from some other server? I realize you said they are able to connect in general, but I am thinking testing telnet specifically could rule out any red herrings. It would be nice to make sure this new node is the only one experiencing this very specific behavior.

Given that your staging environment works as expected, and that I can’t share a real time debugging session with you, all my instinct has me leaning toward an environmental variance, which, sadly, is tough to spot.

What about adding yet another server in the production environment following your setup steps and seeing if it can connect?

Hello Derek,

Thanks for your reply.

172.25.1.4 is not the leader, the leader right now is leader_addr = 172.25.2.4:8300, 172.25.1.4 is the bootstrap server.

Yes, all the nodes in the cluster (4) can telnet between each other to ports 8300, 80301, 8302.

New created node can connect to all 3 nodes, except 172.25.1.4
and node 172.25.1.4 can’t connect to 172.25.1.7 (new node).

[centos@consul-server-172-25-1-4 log]$ telnet 172.25.1.7 8301
Trying 172.25.1.7…
telnet: connect to address 172.25.1.7: No route to host

[centos@consul-server-172-25-1-4 log]$ telnet 172.25.1.5 8301
Trying 172.25.1.5…
Connected to 172.25.1.5.
Escape character is ‘^]’.

[centos@consul-server-172-25-2-5 ~] telnet 172.25.1.7 8301 Trying 172.25.1.7... Connected to 172.25.1.7. Escape character is '^]'. ^CConnection closed by foreign host. [centos@consul-server-172-25-2-5 ~] telnet 172.25.2.4 8301
Trying 172.25.2.4…
Connected to 172.25.2.4.
Escape character is ‘^]’.

All servers in segment 172.25.1.0/24 has the same routing table.

this is the logs from consul in the new consul server:

2020/09/17 03:05:11 [INFO] serf: EventMemberJoin: consul-server-172-25-1-7 172.25.1.7

2020/09/17 03:05:11 [INFO] agent: Started HTTP server on 127.0.0.1:8500
2020/09/17 03:05:11 [INFO] agent: Retry join LAN is supported for: aws azure gce softlayer
2020/09/17 03:05:11 [INFO] agent: Joining LAN cluster...
2020/09/17 03:05:11 [INFO] agent: (LAN) joining: [172.25.1.4]
2020/09/17 03:05:18 [ERR] agent: failed to sync remote state: No cluster leader
2020/09/17 03:05:18 [WARN] raft: no known peers, aborting election

I don’t know why it cannot connect to 172.25.1.4 …

I created another instance as you suggested, but it is the same, cannot connect to 172.25.1.4

2020/09/17 20:47:26 [INFO] agent: Retry join LAN is supported for: aws azure gce softlayer
2020/09/17 20:47:26 [INFO] agent: Joining LAN cluster…
2020/09/17 20:47:26 [INFO] agent: (LAN) joining: [172.25.1.4]
2020/09/17 20:47:33 [ERR] agent: failed to sync remote state: No cluster leader
2020/09/17 20:47:35 [WARN] raft: no known peers, aborting election
2020/09/17 20:47:36 [INFO] agent: (LAN) joined: 0 Err: 1 error(s) occurred:

  • Failed to join 172.25.1.4: dial tcp 172.25.1.4:8301: i/o timeout
    2020/09/17 20:47:36 [WARN] agent: Join LAN failed: , retrying in 30s

I even created the instance on another subnet, but it is the same cannot connect to 172.25.1.4

172.25.1.4 is the instance with the scheduled maintenance.

from other nodes i can reach 172.25.1.4,
from the new node i can reach all 4 nodes, except for 172.25.1.4

Thank you Derek!

Hello Derek,

I was wondering what will happen if I stop the bootstrap node instance in the cluster?
It will join again on start?

I think the cluster will be up as there are 3 nodes up.

But I am not sure what will happen when the bootstrap node is up again.

I have stopped server nodes and when the instance is started, the instance joins the cluster again.

Thank you!

You can specify retry-join multiple times, and the agent trying to join will try until it finds one it can succeed with. See this link for an example.

Manual bootstrapping is currently discouraged. Have you read this document? It looks automatic bootstrapping is a feature with your version of Consul. Do I understand correctly that you are trying to manually bootstrap, or are you just concerned what will happen if you remove your current leader? If a leader leaves, this should force a new election. When you bring the former leader back online, you will have to tell it which server(s) to join, and it will join as a follower.

Hi Derek,

Thank you replaying and the links.

My concern is what will happen with the consul server node that we need to stop / start. ?

This node is the one that we used to initially automatically bootstrap the consul server. Right now this node is not the leader. We are running consul in docker containers, and the configuration to run this bootstrap consul server is:

consul agent -server -advertise=172.25.1.4 -datacenter=dc1 -bootstrap-expect=4 -log-level=info -data-dir=/consul/data

This is the only server that starts with the bootstrap-expect option. And does not have a retry-join option.

nodes in our cluster:

172.25.1.4. (node used to bootstrap the consul cluster) (will have to stop / start this instance)
172.25.1.5
172.25.2.4 (current leader)
172.25.2.5

As this node does not have a retry-join I don’t know if it will re-join the cluster… (also it has the option bootstrap-expect “baked”… with cloud-init, it starts the consul container with this option (i was thinking only specifying retry-join…)

Does serf on the other consul server nodes, will attempt to reconnect to the bootstrap server node?
(as this is what I see when a consul server nodes losses connection to the cluster, for example if the instance is stopped)

If the 172.25.1.4 does not joins the cluster, I am thinking in executing one of these:

  • connect to the docker container in 172.25.1.4 and do:

root# consul join 172.25.1.5 172.25.2.4 172.25.2.5

or

  • stop consul containers in nodes:
    172.25.1.5
    172.25.2.4
    172.25.2.5

Once consul in 172.25.1.4 is ready, start consul containers (172.25.1.5, 172.25.2.4
172.25.2.5)

I would like to avoid this option as I don’t want the cluster to be in broken (inconsistent or with data corruption)

Thanks a lot Derek!

I think you should read this document
It specifically states:
“To prevent inconsistencies and split-brain (clusters where multiple servers consider themselves leader) situations, you should either specify the same value for -bootstrap-expect or specify no value at all on all the servers. Only servers that specify a value will attempt to bootstrap the cluster.”

Your datacenter is already bootstrapped, so this seems like it matches the recommendation from the documentation.

consul agent -server -advertise=172.25.1.4 -datacenter=dc1 -retry-join "172.25.1.5" -retry-join "172.25.2.4" -retry-join "172.25.2.5" -log-level=info -data-dir=/consul/data

Hello Derek,

Thank you for the link.

Yes, datacenter is already bootstrapped and cluster working.
I don’t know if the bootstrap node (172.25.1.4) will join the cluster after stop / start, with the current configuration of the docker run command baked on the image (AMI). (specifies bootstrap-expect=4)

I understand the command that you have written and it makes sense. Do not bootstrap, only join the stopped node to the cluster. Cluster is already bootstrapped.

Other nodes have the retry-join pointing to 172.25.1.4, so maybe it will join automatically (BUT 172.25.1.4 have bootstrap-expect, so according to documentation this node is the only one who will attempt to bootstrap a cluster… i guess only when i try, we will know if the stopped / started node will re join the cluster with current bootstrap-expect configuration).

If the 172.25.1.4 does not want to join the cluster (or it will be waiting for other 3 nodes, as is expected because it has included bootstrap-expect=4) I can try specifying multiple retry-join as you mention.

Or the other option is to connect to consul in 172.25.1.4 and execute :

172.25.1.4# consul join 172.25.1.5 172.25.2.4 172.25.2.5

Also I can try removing the container with the bootstrap-expect on 172.25.1.4 and execute again another docker run command but with the command that you sent me (removing the bootstrap-expect option and multiple retry-join. Maybe this this a the better option)

Thanks a lot Derek, have a good day.

Hello Derek,

I have stopped / started the consul server node that had scheduled maintenance.
Here is what i did:

  • created a new server data/state backup (cluster already has snapshots on a persistent store, but just in case)

  • stopped consul container on 172.25.1.4 (as the restart policy of this docker container is --restart unless-stopped it won’t start when the instance starts. It won’t try to bootstrap (which is what I want, consul cluster is already up and running)

  • started the 172.25.1.4 instance (checked that the consul container wasn’t running)

  • ran the consul container without the bootstrap-expect option.

    consul agent -server -advertise=172.25.1.4 -retry-join=172.25.2.4 -retry-join=172.25.1.5 -datacenter=dc1 -log-level=info -data-dir=/consul/data

  • new container on 172.25.1.4 joined the cluster consul: member '172-25-1-4' joined, marking health alive

  • checked with consul-info under raft, last_log_index and num_peers to check of values were similar to another node.

  • queried the catalog and the key/value store to check if they are the same on new consul server.

  • consul server node has been stopped/started and now it is a member of the consul cluster.

Thanks a lot Derek!

Great! I’m glad everything went smoothly.