I have managed to get a bit confused as to the best thing to do within a service file as ExecStop= for a consul server service…
I have read the documents though it feels like they conflict somewhat due to my limited understanding…
When a Server is gracefully exited, the server will not be marked as left. This is to minimally impact the consensus quorum. Instead, the Server will be marked as failed.
For nodes in server mode, the node is removed from the Raft peer set in a graceful manner. This is critical, as in certain situations a non-graceful leave can affect cluster availability.
from the above the first document makes me think that the server should fail to keep the cluster size… But then the second document makes me think that the ‘leave’ command should be used for a graceful leave??
Can someone please let me know the best practice… .currently I have;
When a server explicitly leaves the cluster, the change to the Raft consensus peer set needs to be pushed to a consensus of other servers.
When it rejoins, the same, plus joining servers go through a probationary non-voter period before being promoted to voters.
This all takes time, and is a lot more turbulence to the cluster state than is warranted, if for example, you’re just doing a rolling restart for some maintenance.
On top of this, if you consider the common case f a 3 node cluster, the normal failure resilience is 1. If 1 server gracefully leaves, you now have a temporary 2 node cluster… for which the failure resilience is still 1.
No practical operational benefit was realised from the graceful leave, but it slowed things down - hence why it’s not the default.
Servers gracefully leaving the cluster on shutdown doesn’t make sense unless there’s a larger, varying, number of servers - which is itself not recommended, as amount of chatter between individual servers in the consensus scales inefficiently as the consensus size increases.
All that makes sense to me, but in my experience when a leader leaves, say for patching, there’s a brief period (<=1s) when DNS fails. For our nginx instances that require consistent and reliable DNS resolution we’ve resorted to a temporary hosts file to avoid service outages. I was hoping that a graceful leave might avoid that painful extra step, but perhaps not.
I don’t think it would. A new leader still has to be elected, anyway.
There are two issues which would matter more:
Is there a loadbalancer in front of the DNS traffic, dispatching incoming requests to Consul nodes, and is it being updated to remove nodes from routing before they go down?
Are the remaining Consul servers operating in a configuration which allows for stale reads (see on DNS Caching | Consul - HashiCorp Learn) so that they can continue to serve DNS requests whilst the cluster leader election is ongoing?
Good question. I should have clarified before that we’re actually using dnsmasq locally to forward *.consul queries to the local Consul agent. So, in the case of some nginx instances we’re running, they are performing DNS queries to dnsmasq which then queries the local Consul agent to find the list of healthy upstream servers to pass on requests. When the Consul leader is rebooted (we don’t run a leave command) the nginx instance is unable to get a list of upstream servers.
We currently are not modifying and dns defaults in our Consul config other than providing a recursor for non-*.consul requests.
Enabling caching might resolve our current hurdle, though we’d have to think through whether there are any unintended consequences. Thanks for engaging on this.