Hi! One of my consul server nodes had an outage due to hardware issues. From the logs of the other consul servers I can see events like this:
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]: 2021-03-30T05:47:01.770Z [INFO] agent.server.serf.lan: serf: EventMemberLeave (forced): consul-147-75-39-15 10.99.31.141
Under what circumstance can this event happen? The possibilities, afaik, are:
- The other consul servers tried to connect to the failed one and after 72h they gave up, thus that message.
- Someone issued a
consul force-leave
.
Number 1 isn’t probable because it hadn’t been 72h since failure when the event happened. The event also happened multiple times within the span of an hour and in between, the failed server somehow managed to join back in.
Number 2 is almost impossible because I know nobody could have done that.
Are there some other possibilities that I missed?
Some logs
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]: 2021-03-30T05:47:01.770Z [INFO] agent.server.serf.lan: serf: EventMemberLeave (forced): consul-147-75-39-15 10.99.31.141
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]: 2021-03-30T05:47:01.693Z [INFO] agent.server: Removing LAN server: server="consul-147-75-39-15 (Addr: tcp/10.99.31.141:8300) (DC: packet-ewr1-352-prod)"
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]: 2021-03-30T05:47:01.693Z [INFO] agent.server.serf.lan: serf: EventMemberFailed: consul-147-75-39-15 10.99.31.141
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]: 2021-03-30T05:47:01.693Z [INFO] agent.server: Removing LAN server: server="consul-147-75-39-15 (Addr: tcp/10.99.31.141:8300) (DC: packet-ewr1-352-prod)"
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]: 2021-03-30T05:47:01.693Z [INFO] agent.server.memberlist.lan: memberlist: Marking consul-147-75-39-15 as failed, suspect timeout reached (2 peer confirmations)
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]: 2021-03-30T05:47:01.693Z [INFO] agent.server.serf.lan: serf: EventMemberFailed: consul-147-75-39-15 10.99.31.141
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]: 2021-03-30T05:47:01.693Z [INFO] agent.server.memberlist.lan: memberlist: Marking consul-147-75-39-15 as failed, suspect timeout reached (2 peer confirmations)
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]: 2021-03-30T05:47:01.535Z [INFO] agent.server.memberlist.lan: memberlist: Suspect consul-147-75-39-15 has failed, no acks received
Mar 30 05:47:01 consul-147-75-196-239 consul[17654]: 2021-03-30T05:47:01.535Z [INFO] agent.server.memberlist.lan: memberlist: Suspect consul-147-75-39-15 has failed, no acks received
Mar 30 05:46:57 consul-147-75-196-239 consul[17654]: 2021-03-30T05:46:57.535Z [INFO] agent.server.memberlist.lan: memberlist: Suspect consul-147-75-39-15 has failed, no acks received
Mar 30 05:46:57 consul-147-75-196-239 consul[17654]: 2021-03-30T05:46:57.535Z [INFO] agent.server.memberlist.lan: memberlist: Suspect consul-147-75-39-15 has failed, no acks received
Mar 30 05:45:51 consul-147-75-196-239 consul[17654]: 2021-03-30T05:45:51.747Z [INFO] agent.server: Handled event for server in area: event=member-join server=consul-147-75-39-15.packet-ewr1-352-prod area=wan
Mar 30 05:45:51 consul-147-75-196-239 consul[17654]: 2021-03-30T05:45:51.747Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: consul-147-75-39-15.packet-ewr1-352-prod 147.75.39.15
Mar 30 05:45:51 consul-147-75-196-239 consul[17654]: 2021-03-30T05:45:51.747Z [INFO] agent.server: Handled event for server in area: event=member-join server=consul-147-75-39-15.packet-ewr1-352-prod area=wan
Mar 30 05:45:51 consul-147-75-196-239 consul[17654]: 2021-03-30T05:45:51.747Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: consul-147-75-39-15.packet-ewr1-352-prod 147.75.39.15
Mar 30 05:45:00 consul-147-75-196-239 consul[17654]: 2021-03-30T05:45:00.770Z [INFO] agent.server: New leader elected: payload=consul-147-75-196-93
Mar 30 05:45:00 consul-147-75-196-239 consul[17654]: 2021-03-30T05:45:00.770Z [INFO] agent.server: New leader elected: payload=consul-147-75-196-93
Mar 30 05:45:00 consul-147-75-196-239 consul[17654]: 2021-03-30T05:45:00.548Z [WARN] agent.server.raft: failed to get previous log: previous-index=5443567 last-index=5443565 error="log not found"
Mar 30 05:45:00 consul-147-75-196-239 consul[17654]: 2021-03-30T05:45:00.548Z [WARN] agent.server.raft: failed to get previous log: previous-index=5443567 last-index=5443565 error="log not found"
Mar 30 05:44:56 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:56.908Z [WARN] agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:56 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:56.908Z [WARN] agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:55 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:55.536Z [INFO] agent.server: New leader elected: payload=consul-147-75-196-93
Mar 30 05:44:55 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:55.536Z [INFO] agent.server: New leader elected: payload=consul-147-75-196-93
Mar 30 05:44:48 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:48.529Z [INFO] agent.server: Adding LAN server: server="consul-147-75-39-15 (Addr: tcp/10.99.31.141:8300) (DC: packet-ewr1-352-prod)"
Mar 30 05:44:48 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:48.529Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: consul-147-75-39-15 10.99.31.141
Mar 30 05:44:48 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:48.529Z [INFO] agent.server: Adding LAN server: server="consul-147-75-39-15 (Addr: tcp/10.99.31.141:8300) (DC: packet-ewr1-352-prod)"
Mar 30 05:44:48 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:48.529Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: consul-147-75-39-15 10.99.31.141
Mar 30 05:44:47 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:47.774Z [WARN] agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:47 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:47.774Z [WARN] agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:39 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:39.062Z [WARN] agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:39 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:39.062Z [WARN] agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:30 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:30.460Z [WARN] agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:30 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:30.460Z [WARN] agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:22 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:22.083Z [WARN] agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:44:22 consul-147-75-196-239 consul[17654]: 2021-03-30T05:44:22.083Z [WARN] agent.server.raft: rejecting vote request since we have a leader: from=10.99.31.141:8300 leader=10.99.31.129:8300
Mar 30 05:43:57 consul-147-75-196-239 consul[17654]: 2021-03-30T05:43:57.468Z [ERROR] agent.server.raft: failed to decode incoming command: error="read tcp 10.99.31.131:8300->10.99.31.141:43550: read: connection timed out"
Mar 30 05:43:57 consul-147-75-196-239 consul[17654]: 2021-03-30T05:43:57.468Z [ERROR] agent.server.raft: failed to decode incoming command: error="read tcp 10.99.31.131:8300->10.99.31.141:43550: read: connection timed out"
Mar 30 05:42:56 consul-147-75-196-239 consul[17654]: 2021-03-30T05:42:56.630Z [INFO] agent.server: Handled event for server in area: event=member-leave server=consul-147-75-39-15.packet-ewr1-352-prod area=wan
Mar 30 05:42:56 consul-147-75-196-239 consul[17654]: 2021-03-30T05:42:56.630Z [INFO] agent.server.serf.wan: serf: EventMemberLeave: consul-147-75-39-15.packet-ewr1-352-prod 147.75.39.15
Mar 30 05:42:56 consul-147-75-196-239 consul[17654]: 2021-03-30T05:42:56.630Z [INFO] agent.server: Handled event for server in area: event=member-leave server=consul-147-75-39-15.packet-ewr1-352-prod area=wan
Mar 30 05:42:56 consul-147-75-196-239 consul[17654]: 2021-03-30T05:42:56.630Z [INFO] agent.server.serf.wan: serf: EventMemberLeave: consul-147-75-39-15.packet-ewr1-352-prod 147.75.39.15
Mar 30 05:42:18 consul-147-75-196-239 consul[17654]: 2021-03-30T05:42:18.765Z [ERROR] agent.server.rpc: multiplex conn accept failed: conn=from=147.75.39.15:51088 error="keepalive timeout"
Mar 30 05:42:18 consul-147-75-196-239 consul[17654]: 2021-03-30T05:42:18.765Z [ERROR] agent.server: yamux: keepalive failed: i/o deadline reached
Mar 30 05:42:18 consul-147-75-196-239 consul[17654]: 2021-03-30T05:42:18.765Z [ERROR] agent.server.rpc: multiplex conn accept failed: conn=from=147.75.39.15:51088 error="keepalive timeout"
Mar 30 05:42:18 consul-147-75-196-239 consul[17654]: 2021-03-30T05:42:18.765Z [ERROR] agent.server: yamux: keepalive failed: i/o deadline reached
Mar 30 05:41:53 consul-147-75-196-239 consul[17654]: 2021-03-30T05:41:53.528Z [INFO] agent.server: Removing LAN server: server="consul-147-75-39-15 (Addr: tcp/10.99.31.141:8300) (DC: packet-ewr1-352-prod)"
Mar 30 05:41:53 consul-147-75-196-239 consul[17654]: 2021-03-30T05:41:53.528Z [INFO] agent.server.serf.lan: serf: EventMemberLeave (forced): consul-147-75-39-15 10.99.31.141
Thanks for helping.