I’ve been running a homely 3 node Consul cluster for a number of years on Raspberry PI’s to manage and learn about service discovery, a great tool.
The cluster servers all have this config (it’s been a foundational like this since early consul 1.x with minor tweaks as consul features changed)
{"bind_addr":"$PRIVATE_IP","bootstrap_expect":3,"client_addr":"127.0.0.1 $PRIVATE_IP","data_dir":"/opt/consul","datacenter":"molelab","log_file":"/var/log/consul/","log_level":"WARN","node_name":"$HOSTNAME","server":true,"ui_config":[{"enabled":true}]}
the 2 variables PRIVATE_IP and $HOSTNAME are unique to the server and filled in by config management which is why I be sure the config hasn’t changed, as it’s deploy by config management
Have a NAS box that runs a few services in my lab that are announced by consul service discovery. This has always been a client member, not a server. the config is almost identical to the servers but with a few parameters missing/disabled such as the UI and server parameters and in this config it tells the names of the servers to join as a member.
{"bind_addr":"$PRIVATE_IP","client_addr":"127.0.0.1 $PRIVATE_IP","data_dir":"/consul/data","datacenter":"molelab","log_level":"WARN","node_name":"$HOSTNAME","retry_join":["wesley.no-dns.co.uk","nog.no-dns.co.uk","jake.no-dns.co.uk"]}
the only thing that’s changed in this basic setup has been
a.) the Consul Version (currently on 1.19.1)
b.) the services the NAS offers via service discovery, defined in a different config file, one config file per service it presents.
At some point recently (I’m guessing between 1.17 and 1.19) the member on the NAS host has started joining as a server and not only that it always seems to join as a leader.
If I check the status of the cluster, you can see the node ‘Paris’ which is the NAS box, is in the server cluster and the leader.
Node ID Address State Voter RaftProtocol Commit Index Trails Leader By
paris.no-dns.co.uk abf56da5-a135-0da2-6ac9-2065ca6e2eb4 10.11.216.64:8300 leader true 3 2414387 -
nog.no-dns.co.uk 13e64b39-9b82-5bbc-b46b-441052445bd7 10.11.216.182:8300 follower true 3 2414387 0 commits
jake.no-dns.co.uk 582e89d2-e8c5-0ba9-e8db-795651367da4 10.11.216.234:8300 follower true 3 2414387 0 commits
wesley.no-dns.co.uk e3d0db6d-269d-1ff9-7428-e575afb02845 10.11.216.81:8300 follower true 3 2414387 0 commits
I cannot explain this behaviour other than at some point in the version upgrades some options have changed (for example - Server:true) in their default position, so instead of being able to ommit Server:True, because it defaults to false, I would now have to set Server:False on clients, however I don’t think this is the case as there are ~20 other clients using the same client config (managed by the same config management) that remain cluster members, not cluster servers, while running on 1.19.1
What should I be looking at as to why this one node has suddenly changed from member to server and oddly always defaults to being the leader ?