Consul[407]: agent: Coordinate update error: error="No cluster leader"

sammy676776 · May 26, 2023, 8:29am

I have created a new 3 node cluster and completed the consul install . I am constantly getting the following error messages in the logs for all 3 server s

consul: 2023-05-26T04:18:08.730-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
consul[407]: agent.anti_entropy: failed to sync remote state: error="No cluster leader"
consul: 2023-05-26T04:18:17.033-0400 [ERROR] agent: Coordinate update error: error="No cluster leader"
consul[407]: agent: Coordinate update error: error="No cluster leader"

I ran a couple of commands from this forum

consul operator raft list-peers -stale 
Error getting peers: Failed to retrieve raft configuration: Unexpected response code: 500 (No cluster leader)

However I am able to see the consul members

consul members
Node           Address              Status  Type    Build   Protocol  DC       Partition  Segment
Server1host  x.x.x.x.:8301  alive   server  1.15.1  2         prod  default    <all>
Server2host  x.x.x.x.:8301   alive   server  1.15.1  2         prod  default    <all>
Server3host   x.x.x.x.:8301   alive   server  1.15.1  2         prod  default    <all>

consul.hcl snippet

client_addr    = "0.0.0.0"
    bind_addr      = "0.0.0.0"
    advertise_addr = "x.x.x.x."
    addresses {
    http =  "127.0.0.1"
    }
    ports {
    server = 8303
    http   = 8500
    https  = 8501
    grpc_tls   = 8502
    serf_lan  = 8301
  }
    connect {
    enabled = true
  }

Any ideas why they are not electing a leader and my UI is not working well either says “No cluster leader” as well .

maxb · May 26, 2023, 9:30am

This surprises me, a lot. I’m pretty sure I’ve successfully used the -stale flag to retrieve the Raft peers without a leader in the past. It’s possible this has regressed, which would be really bad, since it’s a critical diagnostic tool in understanding a broken Raft configuration.

Without this information, it’s really difficult to make any useful suggestions. However, it’s possible the current Raft configuration may be logged during startup - I think I remember seeing it there.

Can you restart a Consul server process, collect a few minutes of logs, starting with the startup, and post them here?

Please do not fully obfuscate IP addresses or other node identifiers, as they may be relevant to understanding the problem.

sammy676776 · May 26, 2023, 5:28pm

I am not sure I can paste full logs without obfucating but can I just knock off the data_dir and start fresh ? I can see them join the cluster but have trouble electing leader

2023-05-26T13:24:54.010-0400 [INFO]  agent: Joining cluster...: cluster=LAN
2023-05-26T13:24:54.010-0400 [INFO]  agent: (LAN) joining: lan_addresses=["host1", "host2"]
2023-05-26T13:24:54.010-0400 [INFO]  agent: started state syncer
2023-05-26T13:24:54.010-0400 [INFO]  agent: Consul agent running!
2023-05-26T13:24:54.024-0400 [INFO]  agent: (LAN) joined: number_of_nodes=1
2023-05-26T13:24:54.024-0400 [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=1
2023-05-26T13:24:59.412-0400 [WARN]  agent.server.raft: no known peers, aborting election
2023-05-26T13:25:01.468-0400 [WARN]  agent.cache: handling error in Cache.Notify: cache-type=connect-ca-leaf error="No cluster leader" index=0
2023-05-26T13:25:01.468-0400 [ERROR] agent.server.cert-manager: failed to handle cache update event: error="leaf cert watch returned an error: No cluster leader"

sammy676776 · May 26, 2023, 5:43pm

I completely wiped out the DATA_DIR and restarted all 3 servers manually but still same problem . this is a brand new cluster and has similar settings to my other cluster …a bit stumped …but could it be some blocking port ? I have opened firewall for all ports

maxb · May 26, 2023, 6:04pm

If you can’t show more logs, I can’t help.

sammy676776 · May 26, 2023, 6:18pm

Thanks @maxb . There is company relevant hostnames and ip that I cannot disclose but looking for any advise. I mean I compeltely wiped out and “consul members” show active members and I also had a “client” join without problem. I can even access the GUI but in the end it is useless because it does not have a cluster leader and nothing works .

Is there a certain port or a certain config parameter I should focus on ? Any pointers appreciated .

Even this command fails

consul operator raft list-peers

Error getting peers: Failed to retrieve raft configuration: Unexpected response code: 500 (No cluster leader)

Only raft relevant message in log I found was

2023-05-26T14:04:03.549-0400 [INFO]  agent.server.raft: initial configuration: index=0 servers=[]
2023-05-26T14:04:03.549-0400 [INFO]  agent.server.raft: entering follower state: follower="Node at x.x.x.x.:8303 [Follower]" leader-address= leader-id=
2023-05-26T14:04:09.730-0400 [WARN]  agent.server.raft: no known peers, aborting election

maxb · May 27, 2023, 7:55am

If you do need to obfuscate, because it’s too hard to talk sense into people imposing requirements, then the way to do it is to replace hostnames and IPs with other generic hostnames and IPs that:

Still look like hostnames/IPs, so they communicate what was replaced
Always replace the same hostname/IP with the same unique replacement, so that someone reading the obfuscated logs can still identify that the same node is being referenced across multiple lines of logs.

Well, yes, it would. Without the -stale option, it by definition tries to reach a cluster leader.

I’m beginning to wonder… has this cluster ever worked?

Could you paste your entire Consul server configuration file, not just the “snippet” you showed earlier?

Have you perhaps not done anything to bootstrap the cluster, either via the configuration file or CLI command?

sammy676776 · May 27, 2023, 7:13pm

@maxb .Thanks for all the help . This was a brand new cluster and I ran into this issue and was following this document . Deployment Guide | Consul | HashiCorp Developer . Since I was not going to use ACL I had not done any bootstrap but after your comments above I went back and read another doc Bootstrap a Datacenter | Consul | HashiCorp Developer and then I did the following 3 steps and it started working .

* consul join <Node A Address> <Node B Address> <Node C Address>
*  add bootstrap-expect = 3 in consul.hcl
* Restart consul

maxb · May 27, 2023, 7:18pm

In Consul, there are multiple bootstraps:

Bootstrapping the Raft clustering and consensus system - i.e. creating a cluster
Bootstrapping the ACL system - i.e. turning on permissions enforcement and creating the first superuser token

Topic		Replies	Views
Errors in new Consul cluster Consul	3	2438	February 26, 2023
Consul deployment issues in a three node cluster setup Consul	0	324	July 6, 2022
No cluster leader (5 node cluster, how to recover?) Consul consul	10	9315	October 28, 2022
Consul not able to start in server side Consul	1	300	November 14, 2019
Unexpected response code: 500 (No cluster leader) Consul	1	10406	May 20, 2020

Consul[407]: agent: Coordinate update error: error="No cluster leader"

Related topics