My nomad servers are behind an amazon auto scaling group. When I need to refresh them, I’ve been simply using the AWS CLI to do this:
aws autoscaling start-instance-refresh --auto-scaling-group-name nomad-servers
Which initiates a replacement of each instance one at a time. Per this AWS documentation that is supposed to initiate a graceful shutdown of all services, before terminating the instance.
So why then is my nomad cluster left in a state where each (of the 3) servers think there are missing servers? If nomad was shut down gracefully, shouldn’t it have gracefully left the cluster?
I have systemd setup to manage nomad using this service. When the instnace launches, this is invoked sudo systemctl enable nomad \ sudo systemctl start nomad
One thing I find questionable(?) about that service is KillSignal=SIGINT
– would SIGTERM
be more appropriate here?
Here’s what my cluster looks like after this AWS instance refresh. All of these left
servers are dead and gone at this point.
$ NOMAD_ADDR=http://10.30.11.6:4646 nomad server members
Name Address Port Status Leader Raft Version Build Datacenter Region
ip-10-30-11-6.eu-central-1.compute.internal.global 10.30.11.6 4648 alive true 3 1.3.5 dc1 global
ip-10-30-21-145.eu-central-1.compute.internal.global 10.30.21.145 4648 alive false 3 1.3.5 dc1 global
ip-10-30-21-37.eu-central-1.compute.internal.global 10.30.21.37 4648 left false 3 1.3.5 dc1 global
ip-10-30-31-168.eu-central-1.compute.internal.global 10.30.31.168 4648 left false 3 1.3.5 dc1 global
ip-10-30-31-246.eu-central-1.compute.internal.global 10.30.31.246 4648 alive false 3 1.3.5 dc1 global
$ NOMAD_ADDR=http://10.30.21.145:4646 nomad server members
Name Address Port Status Leader Raft Version Build Datacenter Region
ip-10-30-11-6.eu-central-1.compute.internal.global 10.30.11.6 4648 alive true 3 1.3.5 dc1 global
ip-10-30-21-145.eu-central-1.compute.internal.global 10.30.21.145 4648 alive false 3 1.3.5 dc1 global
ip-10-30-31-246.eu-central-1.compute.internal.global 10.30.31.246 4648 alive false 3 1.3.5 dc1 global
$ NOMAD_ADDR=http://10.30.31.246:4646 nomad server members
Name Address Port Status Leader Raft Version Build Datacenter Region
ip-10-30-11-6.eu-central-1.compute.internal.global 10.30.11.6 4648 alive true 3 1.3.5 dc1 global
ip-10-30-21-145.eu-central-1.compute.internal.global 10.30.21.145 4648 alive false 3 1.3.5 dc1 global
ip-10-30-21-37.eu-central-1.compute.internal.global 10.30.21.37 4648 left false 3 1.3.5 dc1 global
ip-10-30-31-246.eu-central-1.compute.internal.global 10.30.31.246 4648 alive false 3 1.3.5 dc1 global