Consul[407]: agent: Coordinate update error: error="No cluster leader"

I have created a new 3 node cluster and completed the consul install . I am constantly getting the following error messages in the logs for all 3 server s

consul: 2023-05-26T04:18:08.730-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
consul[407]: agent.anti_entropy: failed to sync remote state: error="No cluster leader"
consul: 2023-05-26T04:18:17.033-0400 [ERROR] agent: Coordinate update error: error="No cluster leader"
consul[407]: agent: Coordinate update error: error="No cluster leader"

I ran a couple of commands from this forum

consul operator raft list-peers -stale 
Error getting peers: Failed to retrieve raft configuration: Unexpected response code: 500 (No cluster leader)

However I am able to see the consul members

consul members
Node           Address              Status  Type    Build   Protocol  DC       Partition  Segment
Server1host  x.x.x.x.:8301  alive   server  1.15.1  2         prod  default    <all>
Server2host  x.x.x.x.:8301   alive   server  1.15.1  2         prod  default    <all>
Server3host   x.x.x.x.:8301   alive   server  1.15.1  2         prod  default    <all>

consul.hcl snippet

client_addr    = "0.0.0.0"
    bind_addr      = "0.0.0.0"
    advertise_addr = "x.x.x.x."
    addresses {
    http =  "127.0.0.1"
    }
    ports {
    server = 8303
    http   = 8500
    https  = 8501
    grpc_tls   = 8502
    serf_lan  = 8301
  }
    connect {
    enabled = true
  }

Any ideas why they are not electing a leader and my UI is not working well either says “No cluster leader” as well .

This surprises me, a lot. I’m pretty sure I’ve successfully used the -stale flag to retrieve the Raft peers without a leader in the past. It’s possible this has regressed, which would be really bad, since it’s a critical diagnostic tool in understanding a broken Raft configuration.

Without this information, it’s really difficult to make any useful suggestions. However, it’s possible the current Raft configuration may be logged during startup - I think I remember seeing it there.

Can you restart a Consul server process, collect a few minutes of logs, starting with the startup, and post them here?

Please do not fully obfuscate IP addresses or other node identifiers, as they may be relevant to understanding the problem.

I am not sure I can paste full logs without obfucating but can I just knock off the data_dir and start fresh ? I can see them join the cluster but have trouble electing leader

2023-05-26T13:24:54.010-0400 [INFO]  agent: Joining cluster...: cluster=LAN
2023-05-26T13:24:54.010-0400 [INFO]  agent: (LAN) joining: lan_addresses=["host1", "host2"]
2023-05-26T13:24:54.010-0400 [INFO]  agent: started state syncer
2023-05-26T13:24:54.010-0400 [INFO]  agent: Consul agent running!
2023-05-26T13:24:54.024-0400 [INFO]  agent: (LAN) joined: number_of_nodes=1
2023-05-26T13:24:54.024-0400 [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=1
2023-05-26T13:24:59.412-0400 [WARN]  agent.server.raft: no known peers, aborting election
2023-05-26T13:25:01.468-0400 [WARN]  agent.cache: handling error in Cache.Notify: cache-type=connect-ca-leaf error="No cluster leader" index=0
2023-05-26T13:25:01.468-0400 [ERROR] agent.server.cert-manager: failed to handle cache update event: error="leaf cert watch returned an error: No cluster leader"

I completely wiped out the DATA_DIR and restarted all 3 servers manually but still same problem . this is a brand new cluster and has similar settings to my other cluster …a bit stumped …but could it be some blocking port ? I have opened firewall for all ports

If you can’t show more logs, I can’t help.

Thanks @maxb . There is company relevant hostnames and ip that I cannot disclose but looking for any advise. I mean I compeltely wiped out and “consul members” show active members and I also had a “client” join without problem. I can even access the GUI but in the end it is useless because it does not have a cluster leader and nothing works .

Is there a certain port or a certain config parameter I should focus on ? Any pointers appreciated .

Even this command fails

consul operator raft list-peers

Error getting peers: Failed to retrieve raft configuration: Unexpected response code: 500 (No cluster leader)

Only raft relevant message in log I found was

2023-05-26T14:04:03.549-0400 [INFO]  agent.server.raft: initial configuration: index=0 servers=[]
2023-05-26T14:04:03.549-0400 [INFO]  agent.server.raft: entering follower state: follower="Node at x.x.x.x.:8303 [Follower]" leader-address= leader-id=
2023-05-26T14:04:09.730-0400 [WARN]  agent.server.raft: no known peers, aborting election

If you do need to obfuscate, because it’s too hard to talk sense into people imposing requirements, then the way to do it is to replace hostnames and IPs with other generic hostnames and IPs that:

  • Still look like hostnames/IPs, so they communicate what was replaced
  • Always replace the same hostname/IP with the same unique replacement, so that someone reading the obfuscated logs can still identify that the same node is being referenced across multiple lines of logs.

Well, yes, it would. Without the -stale option, it by definition tries to reach a cluster leader.


I’m beginning to wonder… has this cluster ever worked?

Could you paste your entire Consul server configuration file, not just the “snippet” you showed earlier?

Have you perhaps not done anything to bootstrap the cluster, either via the configuration file or CLI command?

@maxb .Thanks for all the help . This was a brand new cluster and I ran into this issue and was following this document . Deployment Guide | Consul | HashiCorp Developer . Since I was not going to use ACL I had not done any bootstrap but after your comments above I went back and read another doc Bootstrap a Datacenter | Consul | HashiCorp Developer and then I did the following 3 steps and it started working .

* consul join <Node A Address> <Node B Address> <Node C Address>
*  add bootstrap-expect = 3 in consul.hcl
* Restart consul

In Consul, there are multiple bootstraps:

  • Bootstrapping the Raft clustering and consensus system - i.e. creating a cluster
  • Bootstrapping the ACL system - i.e. turning on permissions enforcement and creating the first superuser token