Consul 3 node cluster - randomly changed to 4 node

ikonia · August 24, 2024, 11:08am

I’ve been running a homely 3 node Consul cluster for a number of years on Raspberry PI’s to manage and learn about service discovery, a great tool.

The cluster servers all have this config (it’s been a foundational like this since early consul 1.x with minor tweaks as consul features changed)

{"bind_addr":"$PRIVATE_IP","bootstrap_expect":3,"client_addr":"127.0.0.1 $PRIVATE_IP","data_dir":"/opt/consul","datacenter":"molelab","log_file":"/var/log/consul/","log_level":"WARN","node_name":"$HOSTNAME","server":true,"ui_config":[{"enabled":true}]}

the 2 variables PRIVATE_IP and $HOSTNAME are unique to the server and filled in by config management which is why I be sure the config hasn’t changed, as it’s deploy by config management

Have a NAS box that runs a few services in my lab that are announced by consul service discovery. This has always been a client member, not a server. the config is almost identical to the servers but with a few parameters missing/disabled such as the UI and server parameters and in this config it tells the names of the servers to join as a member.

{"bind_addr":"$PRIVATE_IP","client_addr":"127.0.0.1 $PRIVATE_IP","data_dir":"/consul/data","datacenter":"molelab","log_level":"WARN","node_name":"$HOSTNAME","retry_join":["wesley.no-dns.co.uk","nog.no-dns.co.uk","jake.no-dns.co.uk"]}

the only thing that’s changed in this basic setup has been

a.) the Consul Version (currently on 1.19.1)
b.) the services the NAS offers via service discovery, defined in a different config file, one config file per service it presents.

At some point recently (I’m guessing between 1.17 and 1.19) the member on the NAS host has started joining as a server and not only that it always seems to join as a leader.

If I check the status of the cluster, you can see the node ‘Paris’ which is the NAS box, is in the server cluster and the leader.

Node ID Address State Voter RaftProtocol Commit Index Trails Leader By
paris.no-dns.co.uk abf56da5-a135-0da2-6ac9-2065ca6e2eb4 10.11.216.64:8300 leader true 3 2414387 -
nog.no-dns.co.uk 13e64b39-9b82-5bbc-b46b-441052445bd7 10.11.216.182:8300 follower true 3 2414387 0 commits
jake.no-dns.co.uk 582e89d2-e8c5-0ba9-e8db-795651367da4 10.11.216.234:8300 follower true 3 2414387 0 commits
wesley.no-dns.co.uk e3d0db6d-269d-1ff9-7428-e575afb02845 10.11.216.81:8300 follower true 3 2414387 0 commits

I cannot explain this behaviour other than at some point in the version upgrades some options have changed (for example - Server:true) in their default position, so instead of being able to ommit Server:True, because it defaults to false, I would now have to set Server:False on clients, however I don’t think this is the case as there are ~20 other clients using the same client config (managed by the same config management) that remain cluster members, not cluster servers, while running on 1.19.1

What should I be looking at as to why this one node has suddenly changed from member to server and oddly always defaults to being the leader ?

ikonia · August 24, 2024, 11:11am

(home lab that should have read)

ikonia · September 30, 2024, 12:46pm

I’m going to give this a little bump, as I’ve rebuilt from scratch my dev cluster while moving to 1.19.2 would like to try to understand this behaviour a little better to try to ensure I correct the config in this rebuild

Topic		Replies	Views
The services on each node inconsistency Consul	5	570	January 2, 2021
Consul services definition for Server Consul	0	319	July 29, 2020
Upgrade Consul from 3 to 5 node cluster Consul	6	327	June 22, 2023
Consul services disapper when underlying register node goes down Consul	0	298	July 1, 2021
Proxy configuration in a docker environment Consul connect , consul	0	410	March 16, 2023

Consul 3 node cluster - randomly changed to 4 node

Related topics