Under what circumstance does `EventMemberLeave (forced)` happen?

Hi! One of my consul server nodes had an outage due to hardware issues. From the logs of the other consul servers I can see events like this:

Mar 30 05:47:01 consul-147-75-196-239 consul[17654]: 2021-03-30T05:47:01.770Z [INFO] agent.server.serf.lan: serf: EventMemberLeave (forced): consul-147-75-39-15 10.99.31.141

Under what circumstance can this event happen? The possibilities, afaik, are:

  1. The other consul servers tried to connect to the failed one and after 72h they gave up, thus that message.
  2. Someone issued a consul force-leave.

Number 1 isn’t probable because it hadn’t been 72h since failure when the event happened. The event also happened multiple times within the span of an hour and in between, the failed server somehow managed to join back in.
Number 2 is almost impossible because I know nobody could have done that.

Are there some other possibilities that I missed?

Some logs

Mar 30 05:47:01 consul-147-75-196-239 consul[17654]:     2021-03-30T05:47:01.770Z [INFO]  agent.server.serf.lan: serf: EventMemberLeave (forced): consul-147-75-39-15 10.99.31.141
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]: 2021-03-30T05:47:01.693Z [INFO]  agent.server: Removing LAN server: server="consul-147-75-39-15 (Addr: tcp/10.99.31.141:8300) (DC: packet-ewr1-352-prod)"
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]: 2021-03-30T05:47:01.693Z [INFO]  agent.server.serf.lan: serf: EventMemberFailed: consul-147-75-39-15 10.99.31.141
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]:     2021-03-30T05:47:01.693Z [INFO]  agent.server: Removing LAN server: server="consul-147-75-39-15 (Addr: tcp/10.99.31.141:8300) (DC: packet-ewr1-352-prod)"
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]: 2021-03-30T05:47:01.693Z [INFO]  agent.server.memberlist.lan: memberlist: Marking consul-147-75-39-15 as failed, suspect timeout reached (2 peer confirmations)
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]:     2021-03-30T05:47:01.693Z [INFO]  agent.server.serf.lan: serf: EventMemberFailed: consul-147-75-39-15 10.99.31.141
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]:     2021-03-30T05:47:01.693Z [INFO]  agent.server.memberlist.lan: memberlist: Marking consul-147-75-39-15 as failed, suspect timeout reached (2 peer confirmations)
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]: 2021-03-30T05:47:01.535Z [INFO]  agent.server.memberlist.lan: memberlist: Suspect consul-147-75-39-15 has failed, no acks received
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]:     2021-03-30T05:47:01.535Z [INFO]  agent.server.memberlist.lan: memberlist: Suspect consul-147-75-39-15 has failed, no acks received
Mar 30 05:46:57 consul-147-75-196-239 consul[17654]: 2021-03-30T05:46:57.535Z [INFO]  agent.server.memberlist.lan: memberlist: Suspect consul-147-75-39-15 has failed, no acks received
Mar 30 05:46:57 consul-147-75-196-239 consul[17654]:     2021-03-30T05:46:57.535Z [INFO]  agent.server.memberlist.lan: memberlist: Suspect consul-147-75-39-15 has failed, no acks received
Mar 30 05:45:51 consul-147-75-196-239 consul[17654]: 2021-03-30T05:45:51.747Z [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-147-75-39-15.packet-ewr1-352-prod area=wan
Mar 30 05:45:51 consul-147-75-196-239 consul[17654]: 2021-03-30T05:45:51.747Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-147-75-39-15.packet-ewr1-352-prod 147.75.39.15
Mar 30 05:45:51 consul-147-75-196-239 consul[17654]:     2021-03-30T05:45:51.747Z [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-147-75-39-15.packet-ewr1-352-prod area=wan
Mar 30 05:45:51 consul-147-75-196-239 consul[17654]:     2021-03-30T05:45:51.747Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-147-75-39-15.packet-ewr1-352-prod 147.75.39.15
Mar 30 05:45:00 consul-147-75-196-239 consul[17654]: 2021-03-30T05:45:00.770Z [INFO]  agent.server: New leader elected: payload=consul-147-75-196-93
Mar 30 05:45:00 consul-147-75-196-239 consul[17654]:     2021-03-30T05:45:00.770Z [INFO]  agent.server: New leader elected: payload=consul-147-75-196-93
Mar 30 05:45:00 consul-147-75-196-239 consul[17654]: 2021-03-30T05:45:00.548Z [WARN]  agent.server.raft: failed to get previous log: previous-index=5443567 last-index=5443565 error="log not found"
Mar 30 05:45:00 consul-147-75-196-239 consul[17654]:     2021-03-30T05:45:00.548Z [WARN]  agent.server.raft: failed to get previous log: previous-index=5443567 last-index=5443565 error="log not found"
Mar 30 05:44:56 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:56.908Z [WARN]  agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:56 consul-147-75-196-239 consul[17654]:     2021-03-30T05:44:56.908Z [WARN]  agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:55 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:55.536Z [INFO]  agent.server: New leader elected: payload=consul-147-75-196-93
Mar 30 05:44:55 consul-147-75-196-239 consul[17654]:     2021-03-30T05:44:55.536Z [INFO]  agent.server: New leader elected: payload=consul-147-75-196-93
Mar 30 05:44:48 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:48.529Z [INFO]  agent.server: Adding LAN server: server="consul-147-75-39-15 (Addr: tcp/10.99.31.141:8300) (DC: packet-ewr1-352-prod)"
Mar 30 05:44:48 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:48.529Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: consul-147-75-39-15 10.99.31.141
Mar 30 05:44:48 consul-147-75-196-239 consul[17654]:     2021-03-30T05:44:48.529Z [INFO]  agent.server: Adding LAN server: server="consul-147-75-39-15 (Addr: tcp/10.99.31.141:8300) (DC: packet-ewr1-352-prod)"
Mar 30 05:44:48 consul-147-75-196-239 consul[17654]:     2021-03-30T05:44:48.529Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: consul-147-75-39-15 10.99.31.141
Mar 30 05:44:47 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:47.774Z [WARN]  agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:47 consul-147-75-196-239 consul[17654]:     2021-03-30T05:44:47.774Z [WARN]  agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:39 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:39.062Z [WARN]  agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:39 consul-147-75-196-239 consul[17654]:     2021-03-30T05:44:39.062Z [WARN]  agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:30 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:30.460Z [WARN]  agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:30 consul-147-75-196-239 consul[17654]:     2021-03-30T05:44:30.460Z [WARN]  agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:22 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:22.083Z [WARN]  agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:22 consul-147-75-196-239 consul[17654]:     2021-03-30T05:44:22.083Z [WARN]  agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:43:57 consul-147-75-196-239 consul[17654]: 2021-03-30T05:43:57.468Z [ERROR] agent.server.raft: failed to decode incoming command: error="read tcp 10.99.31.131:8300->10.99.31.141:43550: read: connection timed out"
Mar 30 05:43:57 consul-147-75-196-239 consul[17654]:     2021-03-30T05:43:57.468Z [ERROR] agent.server.raft: failed to decode incoming command: error="read tcp 10.99.31.131:8300->10.99.31.141:43550: read: connection timed out"
Mar 30 05:42:56 consul-147-75-196-239 consul[17654]: 2021-03-30T05:42:56.630Z [INFO]  agent.server: Handled event for server in area: event=member-leave server=consul-147-75-39-15.packet-ewr1-352-prod area=wan
Mar 30 05:42:56 consul-147-75-196-239 consul[17654]: 2021-03-30T05:42:56.630Z [INFO]  agent.server.serf.wan: serf: EventMemberLeave: consul-147-75-39-15.packet-ewr1-352-prod 147.75.39.15
Mar 30 05:42:56 consul-147-75-196-239 consul[17654]:     2021-03-30T05:42:56.630Z [INFO]  agent.server: Handled event for server in area: event=member-leave server=consul-147-75-39-15.packet-ewr1-352-prod area=wan
Mar 30 05:42:56 consul-147-75-196-239 consul[17654]:     2021-03-30T05:42:56.630Z [INFO]  agent.server.serf.wan: serf: EventMemberLeave: consul-147-75-39-15.packet-ewr1-352-prod 147.75.39.15
Mar 30 05:42:18 consul-147-75-196-239 consul[17654]: 2021-03-30T05:42:18.765Z [ERROR] agent.server.rpc: multiplex conn accept failed: conn=from=147.75.39.15:51088 error="keepalive timeout"
Mar 30 05:42:18 consul-147-75-196-239 consul[17654]: 2021-03-30T05:42:18.765Z [ERROR] agent.server: yamux: keepalive failed: i/o deadline reached
Mar 30 05:42:18 consul-147-75-196-239 consul[17654]:     2021-03-30T05:42:18.765Z [ERROR] agent.server.rpc: multiplex conn accept failed: conn=from=147.75.39.15:51088 error="keepalive timeout"
Mar 30 05:42:18 consul-147-75-196-239 consul[17654]:     2021-03-30T05:42:18.765Z [ERROR] agent.server: yamux: keepalive failed: i/o deadline reached
Mar 30 05:41:53 consul-147-75-196-239 consul[17654]: 2021-03-30T05:41:53.528Z [INFO]  agent.server: Removing LAN server: server="consul-147-75-39-15 (Addr: tcp/10.99.31.141:8300) (DC: packet-ewr1-352-prod)"
Mar 30 05:41:53 consul-147-75-196-239 consul[17654]: 2021-03-30T05:41:53.528Z [INFO]  agent.server.serf.lan: serf: EventMemberLeave (forced): consul-147-75-39-15 10.99.31.141

Thanks for helping.

Hi @lackhoa,

Welcome to the forums, and thanks for the detailed description of the issue.

This forced removal you are seeing is issued by Consul AutoPilot. You can read more about this feature here (Autopilot | Consul - HashiCorp Learn)

From the logs you shared, looks like it is from a follower node. If you have the logs from the leader while this happened, you would see the autopilot system in action as shown from a sample log below.

[INFO]  agent.server.autopilot: Attempting removal of failed server node: id=d637f6c1-ab46-1cc8-8fe6-74d1d39765ed name=consul-server-3 address=172.27.0.4:8300

Hope this helps.

What you said was correct, I could see logs from the autopilot “attempting removal of failed server node”.
Thank you very much for your help!

1 Like