Trouble getting Consul servers to re-join when AWS marks a node as impared

Hello,

This is my first time posting here.

Sometimes AWS will have a VM fail (I assume some kind of hardware failure in a datacenter) and automatically migrate a VM to another host. After the new host has finished booting that consul server will refuse to rejoin the previous group. I use hard coded node-ids, and as far as I can tell the new node is identical to the other. I have to manually issue the leave command on the rebooted node to get it to rejoin the group. I don’t really understand why I have to use the leave command, since nodes can reboot in the usual manner and they always rejoin properly. How could this AWS node migration be any different? Does anybody have any experience with this problem? It’s hard to debug because there doesn’t seem to a way to artificially trigger this kind of VM migration. Lots of nodes have to run for many months before this is seen. I have managed to gather some logs from a node where this happened and I’ve posted a summary below. These logs repeat indefinitely until the leave command is used.

The failed/rebooted node prints this repeatedly:

consul: http: Request GET /v1/session/list?consistent, error: No cluster leader from=@
consul: raft: Election timeout reached, restarting election
consul: raft: Election timeout reached, restarting election
consul: raft: Election timeout reached, restarting election
consul: raft: Election timeout reached, restarting election
consul: agent: failed to sync remote state: No cluster leader

the other 2 nodes print the following:

consul: raft: Rejecting vote request from 111.11.111.11:8300 since we have a leader: 222.22.222.22:8300
consul: raft: Rejecting vote request from 111.11.111.11:8300 since we have a leader: 222.22.222.22:8300

Even rebooting the ostracized node a second time changes nothing.

Another interesting log:
[DEBUG] raft: Failed to contact 9795d352-d7d4-3c0a-8706-97e350dfdbdb in 15h43m27.276888265s

which is strange because I could see TCP and UDP data on the consul connections to that server.