Install Consul through latest helm chart and I am receive errors for No cluster leader

I setup a home lab with a new Kubernetes cluster for testing. I have the cluster stood up and now I am attempting to install consul onto the cluster through helm charts. I pulled the latest stable git repo per the documentation (v17) of the consul helm chart. All the pods start up and go into a running states, but fail to get to a ready state. The logs state that no cluster leader is available. Below is the logs from one of the pods.

2020/03/10 13:24:45 [ERR] agent: Coordinate update error: No cluster leader
2020/03/10 13:25:11 [ERR] agent: failed to sync remote state: No cluster leader
2020/03/10 13:25:13 [WARN] memberlist: Failed to resolve consul-consul-server-2.consul-consul-server.default.svc: lookup consul-consul-server-2.consul-consul-server.default.svc on 10.96.0.10:53: read udp 10.244.2.41:53127->10.96.0.10:53: i/o timeout
2020/03/10 13:25:13 [WARN] agent: (LAN) couldn’t join: 0 Err: 3 errors occurred:
* Failed to resolve consul-consul-server-0.consul-consul-server.default.svc: lookup consul-consul-server-0.consul-consul-server.default.svc on 10.96.0.10:53: read udp 10.244.2.41:51210->10.96.0.10:53: i/o timeout
* Failed to resolve consul-consul-server-1.consul-consul-server.default.svc: lookup consul-consul-server-1.consul-consul-server.default.svc on 10.96.0.10:53: read udp 10.244.2.41:39328->10.96.0.10:53: i/o timeout
* Failed to resolve consul-consul-server-2.consul-consul-server.default.svc: lookup consul-consul-server-2.consul-consul-server.default.svc on 10.96.0.10:53: read udp 10.244.2.41:53127->10.96.0.10:53: i/o timeout

2020/03/10 13:25:13 [WARN] agent: Join LAN failed: <nil>, retrying in 30s
2020/03/10 13:25:16 [ERR] agent: Coordinate update error: No cluster leader
2020/03/10 13:25:34 [ERR] agent: failed to sync remote state: No cluster leader
2020/03/10 13:25:43 [INFO] agent: (LAN) joining: [consul-consul-server-0.consul-consul-server.default.svc consul-consul-server-1.consul-consul-server.default.svc consul-consul-server-2.consul-consul-server.default.svc]
2020/03/10 13:25:45 [ERR] agent: Coordinate update error: No cluster leader
2020/03/10 13:26:00 [ERR] agent: failed to sync remote state: No cluster leader
2020/03/10 13:26:20 [ERR] agent: Coordinate update error: No cluster leader
2020/03/10 13:26:24 [ERR] agent: failed to sync remote state: No cluster leader
2020/03/10 13:26:25 [WARN] memberlist: Failed to resolve consul-consul-server-0.consul-consul-server.default.svc: lookup consul-consul-server-0.consul-consul-server.default.svc on 10.96.0.10:53: read udp 10.244.2.41:47117->10.96.0.10:53: i/o timeout
2020/03/10 13:26:49 [ERR] agent: Coordinate update error: No cluster leader
2020/03/10 13:26:53 [ERR] agent: failed to sync remote state: No cluster leader
1 Like

Hello @Diggs27

Welcome to the Discuss forum! Thanks for posting for the first time :slight_smile: Can you post your values file, with sensitive information removed, so we can take a look at the configuration? What version of Kubernetes have you installed on the cluster, and what hardware configuration is your home cluster using?

Looking forward to hearing back from you!

Hi @jsosulska Thanks for the reply! I really appreciate you responeding.

Here is a little information about my setup.

  • 4 node cluster setup
  • Fresh Debian 10 builds with latest updates. Running on same ESXi (Esxi running on Intel NUC)
  • Kubernetes version v1.17.3
  • Docker-ce engine install
  • kubeadm used to setup the cluster

I’ve setup a nfs server and I have been using nfs-provisoner helm chart. That seems to be working and I see the PVC setup for all consul servers. For networking I was using kube-router. I tried rebuilding the cluster and installed flanner instead and I ran into the same issue. Confirmed CoreDNS is setup and resolving pods successfully.

I’m using the vanilla values.yaml from a clone of the v0.17.0 version of the git repo


I also have tried v0.16.0 without any luck. A am using helm version 3 and not version 2. I didn’t see anywhere that it wasn’t supported. Could I be using helm 2?

All nodes are healthy and all pods are in Running state as well, just not a ready state.

Hi Diggs27!

Sorry for the lag on responding to this. As you can imagine, things are a bit hectic right now for everyone.

Helm3 definitely works (up to 3.1) so that shouldn’t be an issue. Could you do me a favor and collect a few bits of information? I was hoping to see the output of kubectl get pods -A and kubectl get pvc -A. I’m curious to see how things look cluster wise (i.e. are all pods pulling down images, etc…). Another thing that would be useful is the output of a kubectl describe pod consul-consul-server-2; specifically the output messages at the end.

1 Like

I’ve encountered same issue while installing consul through helm v3.2.1

error encountered:
==> Starting Consul agent…
Version: ‘v1.7.2’
Node ID: ‘5e7ae9ce-570e-a474-85fb-c9db6c9264ed’
Node name: ‘docker-desktop’
Datacenter: ‘dc1’ (Segment: ‘’)
Server: false (Bootstrap: false)
Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
Cluster Addr: 10.1.1.69 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

2020-05-12T17:50:49.309Z [INFO]  agent.client.serf.lan: serf: EventMemberJoin: docker-desktop 10.1.1.69
2020-05-12T17:50:49.310Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=udp
2020-05-12T17:50:49.311Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=tcp
2020-05-12T17:50:49.312Z [INFO]  agent: Started HTTP server: address=[::]:8500 network=tcp
2020-05-12T17:50:49.312Z [INFO]  agent: Started gRPC server: address=[::]:8502 network=tcp
2020-05-12T17:50:49.312Z [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
2020-05-12T17:50:49.312Z [INFO]  agent: Joining cluster...: cluster=LAN
2020-05-12T17:50:49.312Z [INFO]  agent: (LAN) joining: lan_addresses=[consul-server-0.consul-server.default.svc, consul-server-1.consul-server.default.svc, consul-server-2.consul-server.default.svc]
2020-05-12T17:50:49.312Z [INFO]  agent: started state syncer

==> Consul agent running!
2020-05-12T17:50:49.313Z [WARN] agent.client.manager: No servers available
2020-05-12T17:50:49.313Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“No known Consul servers”
2020-05-12T17:50:49.320Z [WARN] agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-0.consul-server.default.svc: lookup consul-server-0.consul-server.default.svc on 10.96.0.10:53: no such host
2020-05-12T17:50:49.327Z [WARN] agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-1.consul-server.default.svc: lookup consul-server-1.consul-server.default.svc on 10.96.0.10:53: no such host
2020-05-12T17:50:49.335Z [WARN] agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-2.consul-server.default.svc: lookup consul-server-2.consul-server.default.svc on 10.96.0.10:53: no such host
2020-05-12T17:50:49.335Z [WARN] agent: (LAN) couldn’t join: number_of_nodes=0 error="3 errors occurred:
* Failed to resolve consul-server-0.consul-server.default.svc: lookup consul-server-0.consul-server.default.svc on 10.96.0.10:53: no such host
* Failed to resolve consul-server-1.consul-server.default.svc: lookup consul-server-1.consul-server.default.svc on 10.96.0.10:53: no such host
* Failed to resolve consul-server-2.consul-server.default.svc: lookup consul-server-2.consul-server.default.svc on 10.96.0.10:53: no such host

"
2020-05-12T17:50:49.335Z [WARN] agent: Join cluster failed, will retry: cluster=LAN retry_interval=30s error=
2020-05-12T17:50:50.893Z [ERROR] agent: Newer Consul version available: new_version=1.7.3 current_version=1.7.2
2020-05-12T17:50:52.238Z [WARN] agent.client.manager: No servers available
2020-05-12T17:50:52.239Z [ERROR] agent.http: Request error: method=GET url=/v1/status/leader from=127.0.0.1:32834 error=“No known Consul servers”
2020-05-12T17:51:02.255Z [WARN] agent.client.manager: No servers available
2020-05-12T17:51:02.255Z [ERROR] agent.http: Request error: method=GET url=/v1/status/leader from=127.0.0.1:32912 error=“No known Consul servers”
2020-05-12T17:51:12.211Z [WARN] agent.client.manager: No servers available
2020-05-12T17:51:12.211Z [ERROR] agent.http: Request error: method=GET url=/v1/status/leader from=127.0.0.1:32986 error=“No known Consul servers”
2020-05-12T17:51:13.749Z [WARN] agent.client.manager: No servers available
2020-05-12T17:51:13.750Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“No known Consul servers”
2020-05-12T17:51:19.303Z [INFO] agent: (LAN) joining: lan_addresses=[consul-server-0.consul-server.default.svc, consul-server-1.consul-server.default.svc, consul-server-2.consul-server.default.svc]
2020-05-12T17:51:19.319Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: consul-server-0 10.1.1.70
2020-05-12T17:51:19.319Z [INFO] agent.client: adding server: server=“consul-server-0 (Addr: tcp/10.1.1.70:8300) (DC: dc1)”
2020-05-12T17:51:19.327Z [WARN] agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-1.consul-server.default.svc: lookup consul-server-1.consul-server.default.svc on 10.96.0.10:53: no such host
2020-05-12T17:51:19.333Z [WARN] agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-2.consul-server.default.svc: lookup consul-server-2.consul-server.default.svc on 10.96.0.10:53: no such host
2020-05-12T17:51:19.333Z [INFO] agent: (LAN) joined: number_of_nodes=1
2020-05-12T17:51:19.333Z [INFO] agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=1
2020-05-12T17:51:28.782Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:51:28.782Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“rpc error making call: No cluster leader”
2020-05-12T17:51:48.446Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:51:48.446Z [ERROR] agent: Coordinate update error: error=“rpc error making call: No cluster leader”
2020-05-12T17:51:57.599Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:51:57.599Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“rpc error making call: No cluster leader”
2020-05-12T17:52:11.204Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:52:11.205Z [ERROR] agent: Coordinate update error: error=“rpc error making call: No cluster leader”
2020-05-12T17:52:26.245Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:52:26.246Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“rpc error making call: No cluster leader”
2020-05-12T17:52:47.077Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:52:47.077Z [ERROR] agent: Coordinate update error: error=“rpc error making call: No cluster leader”
2020-05-12T17:52:52.137Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:52:52.137Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“rpc error making call: No cluster leader”
2020-05-12T17:53:19.968Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:53:19.969Z [ERROR] agent: Coordinate update error: error=“rpc error making call: No cluster leader”
2020-05-12T17:53:27.188Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:53:27.190Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“rpc error making call: No cluster leader”
2020-05-12T17:53:51.836Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:53:51.837Z [ERROR] agent: Coordinate update error: error=“rpc error making call: No cluster leader”
2020-05-12T17:54:00.764Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:54:00.764Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“rpc error making call: No cluster leader”
2020-05-12T17:54:21.190Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:54:21.190Z [ERROR] agent: Coordinate update error: error=“rpc error making call: No cluster leader”
2020-05-12T17:54:31.947Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:54:31.948Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“rpc error making call: No cluster leader”
2020-05-12T17:54:44.682Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:54:44.683Z [ERROR] agent: Coordinate update error: error=“rpc error making call: No cluster leader”
2020-05-12T17:55:06.821Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:55:06.821Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“rpc error making call: No cluster leader”
2020-05-12T17:55:07.960Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:55:07.960Z [ERROR] agent: Coordinate update error: error=“rpc error making call: No cluster leader”
2020-05-12T17:55:38.128Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:55:38.128Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“rpc error making call: No cluster leader”
2020-05-12T17:55:42.570Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:55:42.571Z [ERROR] agent: Coordinate update error: error=“rpc error making call: No cluster leader”
2020-05-12T17:56:10.311Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:56:10.311Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“rpc error making call: No cluster leader”
2020-05-12T17:56:13.576Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.1.1.70:8300 error=“rpc error making call: No cluster leader”
2020-05-12T17:56:13.577Z [ERROR] agent: Coordinate update error: error=“rpc error making call: No cluster leader”

details for consul pods:

NAME READY STATUS RESTARTS AGE
consul-72mzd 0/1 Running 0 2m58s
consul-server-0 0/1 Running 0 2m58s
consul-server-1 0/1 Pending 0 2m58s
consul-server-2 0/1 Pending 0 2m58s
Bibeks-MacBook-Pro-2:bin bibek$ kubectl get pvc -A
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
default data-consul-0 Bound pvc-e4d644f2-0938-4d31-8e09-b26200763bcf 8Gi RWO hostpath 26h
default data-consul-1 Bound pvc-fdbab729-f078-40e5-839e-725a8211226f 8Gi RWO hostpath 26h
default data-consul-2 Bound pvc-05682aa6-e4e4-4dae-87ac-383b9073186f 8Gi RWO hostpath 26h
default data-default-consul-server-0 Bound pvc-5efe39a4-cf7c-4f85-a62c-6e76f3befffa 10Gi RWO hostpath 56m
default data-default-consul-server-1 Bound pvc-8b851807-22a8-42f2-8e3f-e172edf60a2f 10Gi RWO hostpath 56m
default data-default-consul-server-2 Bound pvc-1617edcc-bb22-4e3c-8033-ec543a33dcbb 10Gi RWO hostpath 56m
Bibeks-MacBook-Pro-2:bin bibek$ kubectl describe pod consul-consul-server-2
Error from server (NotFound): pods “consul-consul-server-2” not found
Bibeks-MacBook-Pro-2:bin bibek$ kubectl describe pod consul-server-2
Name: consul-server-2
Namespace: default
Priority: 0
Node:
Labels: app=consul
chart=consul-helm
component=server
controller-revision-hash=consul-server-789d977fb9
hasDNS=true
release=consul
statefulset.kubernetes.io/pod-name=consul-server-2
Annotations: consul.hashicorp.com/connect-inject: false
Status: Pending
IP:
IPs:
Controlled By: StatefulSet/consul-server
Containers:
consul:
Image: consul:1.7.2
Ports: 8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
Command:
/bin/sh
-ec
CONSUL_FULLNAME=“consul”

  exec /bin/consul agent \
    -advertise="${POD_IP}" \
    -bind=0.0.0.0 \
    -bootstrap-expect=3 \
    -client=0.0.0.0 \
    -config-dir=/consul/config \
    -datacenter=dc1 \
    -data-dir=/consul/data \
    -domain=consul \
    -hcl="connect { enabled = true }" \
    -ui \
    -retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
    -retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
    -retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
    -server
  
Readiness:  exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader \

2>/dev/null | grep -E ‘".+"’
] delay=5s timeout=5s period=3s #success=1 #failure=2
Environment:
POD_IP: (v1:status.podIP)
NAMESPACE: default (v1:metadata.namespace)
Mounts:
/consul/config from config (rw)
/consul/data from data-default (rw)
/var/run/secrets/kubernetes.io/serviceaccount from consul-server-token-x6gh2 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
data-default:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-default-consul-server-2
ReadOnly: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: consul-server-config
Optional: false
consul-server-token-x6gh2:
Type: Secret (a volume populated by a Secret)
SecretName: consul-server-token-x6gh2
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Warning FailedScheduling default-scheduler 0/1 nodes are available: 1 node(s) didn’t match pod affinity/anti-affinity, 1 node(s) didn’t satisfy existing pods anti-affinity rules.
Warning FailedScheduling default-scheduler 0/1 nodes are available: 1 node(s) didn’t match pod affinity/anti-affinity, 1 node(s) didn’t satisfy existing pods anti-affinity rules.

Your error is because 2 of your Consul server pods are still in Pending. You can see in their describe status:

Warning FailedScheduling default-scheduler 0/1 nodes are available: 1 node(s) didn’t match pod affinity/anti-affinity, 1 node(s) didn’t satisfy existing pods anti-affinity rules.
Warning FailedScheduling default-scheduler 0/1 nodes are available: 1 node(s) didn’t match pod affinity/anti-affinity, 1 node(s) didn’t satisfy existing pods anti-affinity rules.

This is the issue you need to fix. The server pods have an anti-affinity so they’re not scheduled on the same node. Do you only have 1 node?

If so then you will need to helm delete and then use the values:

server:
  bootstrapExpect: 1
  replicas: 1

Otherwise you’ll need 3 nodes.

3 Likes

If so then you will need to helm delete …

Also adding in here - you may need to delete the PVC before relaunching.

hi jsosulska,
I have same issues from helm installed consul:helm install consul hashicorp/consul --set global.name=iocc-x-consul --set server.storageClass=consul-storage -n consul
Before install consul, I created SC and PV, PVC in k8s cluster(1 master,3 nodes):
[root@master ~]# kubectl get pvc -n consul
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-consul-iocc-x-consul-server-0 Bound data-consul-iocc-x-consul-server-2 10Gi RWO consul-storage 5h37m
data-consul-iocc-x-consul-server-1 Bound data-consul-iocc-x-consul-server-0 10Gi RWO consul-storage 5h37m
data-consul-iocc-x-consul-server-2 Bound data-consul-iocc-x-consul-server-1 10Gi RWO consul-storage 5h37m
[root@master ~]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
data-consul-iocc-x-consul-server-0 10Gi RWO Retain Bound consul/data-consul-iocc-x-consul-server-1 consul-storage 5h38m
data-consul-iocc-x-consul-server-1 10Gi RWO Retain Bound consul/data-consul-iocc-x-consul-server-2 consul-storage 5h38m
data-consul-iocc-x-consul-server-2 10Gi RWO Retain Bound consul/data-consul-iocc-x-consul-server-0 consul-storage

[root@master ~]# kubectl get pod -n consul
NAME READY STATUS RESTARTS AGE
iocc-x-consul-h5pz6 0/1 Running 0 5h37m
iocc-x-consul-server-0 0/1 Running 0 5h37m
iocc-x-consul-server-1 0/1 Running 0 5h37m
iocc-x-consul-server-2 0/1 Running 0 5h37m
iocc-x-consul-tjxmb 0/1 Running 0 5h37m
iocc-x-consul-w4g8n 0/1 Running 0 5h37m
iocc-x-consul-xc9gb 0/1 Running 0 5h37m

[root@master ~]# kubectl describe pod iocc-x-consul-server-0 -n consul
Name: iocc-x-consul-server-0
Namespace: consul
Priority: 0
Node: node3.iocc-test.com/9.112.160.52
Start Time: Thu, 02 Jul 2020 18:19:18 +0800
Labels: app=consul
chart=consul-helm
component=server
controller-revision-hash=iocc-x-consul-server-855cfb8db9
hasDNS=true
release=consul
statefulset.kubernetes.io/pod-name=iocc-x-consul-server-0
Annotations: consul.hashicorp.com/connect-inject: false
Status: Running
IP: 10.244.3.13
IPs:
IP: 10.244.3.13
Controlled By: StatefulSet/iocc-x-consul-server
Containers:
consul:
Container ID: docker://9a63e82d43bd8c15caa00f86c7457ac353e54f367abbd2bb5f49230f56f7d1d2
Image: consul:1.8.0
Image ID: docker-pullable://consul@sha256:0e660ca8ae28d864e3eaaed0e273b2f8cd348af207e2b715237e869d7a8b5dcc
Ports: 8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
Command:
/bin/sh
-ec
CONSUL_FULLNAME=“iocc-x-consul”

  exec /bin/consul agent \
    -advertise="${POD_IP}" \
    -bind=0.0.0.0 \
    -bootstrap-expect=3 \
    -client=0.0.0.0 \
    -config-dir=/consul/config \
    -datacenter=dc1 \
    -data-dir=/consul/data \
    -domain=consul \
    -hcl="connect { enabled = true }" \
    -ui \
    -retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
    -retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
    -retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
    -server

State:          Running
  Started:      Thu, 02 Jul 2020 18:19:23 +0800
Ready:          False
Restart Count:  0
Limits:
  cpu:     100m
  memory:  100Mi
Requests:
  cpu:      100m
  memory:   100Mi
Readiness:  exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader \

2>/dev/null | grep -E ‘".+"’
] delay=5s timeout=5s period=3s #success=1 #failure=2
Environment:
POD_IP: (v1:status.podIP)
NAMESPACE: consul (v1:metadata.namespace)
Mounts:
/consul/config from config (rw)
/consul/data from data-consul (rw)
/var/run/secrets/kubernetes.io/serviceaccount from iocc-x-consul-server-token-g9g2x (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data-consul:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-consul-iocc-x-consul-server-0
ReadOnly: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: iocc-x-consul-server-config
Optional: false
iocc-x-consul-server-token-g9g2x:
Type: Secret (a volume populated by a Secret)
SecretName: iocc-x-consul-server-token-g9g2x
Optional: false
QoS Class: Guaranteed
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Warning Unhealthy 3m (x6699 over 5h37m) kubelet, node3.iocc-test.com Readiness probe failed:

[root@master ~]# kubectl logs iocc-x-consul-server-0 -n consul
bootstrap_expect > 0: expecting 3 servers
==> Starting Consul agent…
Version: ‘v1.8.0’
Node ID: ‘3ac79701-c715-3b15-4b8e-b63bbd59b746’
Node name: ‘iocc-x-consul-server-0’
Datacenter: ‘dc1’ (Segment: ‘’)
Server: true (Bootstrap: false)
Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
Cluster Addr: 10.244.3.13 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

2020-07-02T10:19:26.515Z [INFO]  agent.server.raft: initial configuration: index=0 servers=[]
2020-07-02T10:19:26.516Z [INFO]  agent.server.raft: entering follower state: follower="Node at 10.244.3.13:8300 [Follower]" leader=
2020-07-02T10:19:26.574Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: iocc-x-consul-server-0.dc1 10.244.3.13
2020-07-02T10:19:26.576Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: iocc-x-consul-server-0 10.244.3.13
2020-07-02T10:19:26.576Z [INFO]  agent.server: Handled event for server in area: event=member-join server=iocc-x-consul-server-0.dc1 area=wan
2020-07-02T10:19:26.576Z [INFO]  agent.server: Adding LAN server: server="iocc-x-consul-server-0 (Addr: tcp/10.244.3.13:8300) (DC: dc1)"
2020-07-02T10:19:26.577Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=tcp
2020-07-02T10:19:26.577Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=udp
2020-07-02T10:19:26.663Z [INFO]  agent: Started HTTP server: address=[::]:8500 network=tcp
2020-07-02T10:19:26.669Z [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
2020-07-02T10:19:26.669Z [INFO]  agent: Joining cluster...: cluster=LAN
2020-07-02T10:19:26.669Z [INFO]  agent: (LAN) joining: lan_addresses=[iocc-x-consul-server-0.iocc-x-consul-server.consul.svc, iocc-x-consul-server-1.iocc-x-consul-server.consul.svc, iocc-x-consul-server-2.iocc-x-consul-server.consul.svc]
2020-07-02T10:19:26.669Z [INFO]  agent: started state syncer

==> Consul agent running!
2020-07-02T10:19:26.767Z [WARN] agent.server.memberlist.lan: memberlist: Failed to resolve iocc-x-consul-server-0.iocc-x-consul-server.consul.svc: lookup iocc-x-consul-server-0.iocc-x-consul-server.consul.svc on 10.96.0.10:53: no such host
2020-07-02T10:19:26.772Z [WARN] agent.server.memberlist.lan: memberlist: Failed to resolve iocc-x-consul-server-1.iocc-x-consul-server.consul.svc: lookup iocc-x-consul-server-1.iocc-x-consul-server.consul.svc on 10.96.0.10:53: no such host
2020-07-02T10:19:27.047Z [WARN] agent: (LAN) couldn’t join: number_of_nodes=0 error="3 errors occurred:
* Failed to resolve iocc-x-consul-server-0.iocc-x-consul-server.consul.svc: lookup iocc-x-consul-server-0.iocc-x-consul-server.consul.svc on 10.96.0.10:53: no such host
* Failed to resolve iocc-x-consul-server-1.iocc-x-consul-server.consul.svc: lookup iocc-x-consul-server-1.iocc-x-consul-server.consul.svc on 10.96.0.10:53: no such host
* Failed to join 10.244.2.141: dial tcp 10.244.2.141:8301: connect: connection refused

"
2020-07-02T10:19:27.047Z [WARN] agent: Join cluster failed, will retry: cluster=LAN retry_interval=30s error=
2020-07-02T10:19:33.751Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“No cluster leader”
2020-07-02T10:19:34.446Z [WARN] agent.server.raft: no known peers, aborting election
2020-07-02T10:19:53.881Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: node3.iocc-test.com 10.244.3.12
2020-07-02T10:19:54.063Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: iocc-x-consul-server-1.dc1 10.244.1.131
2020-07-02T10:19:54.064Z [INFO] agent.server: Handled event for server in area: event=member-join server=iocc-x-consul-server-1.dc1 area=wan
2020-07-02T10:19:54.163Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: node2.iocc-test.com 10.244.2.140
2020-07-02T10:19:54.263Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: master.iocc-test.com 10.244.0.38
2020-07-02T10:19:54.263Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: iocc-x-consul-server-1 10.244.1.131
2020-07-02T10:19:54.264Z [INFO] agent.server: Adding LAN server: server=“iocc-x-consul-server-1 (Addr: tcp/10.244.1.131:8300) (DC: dc1)”
2020-07-02T10:19:54.564Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: iocc-x-consul-server-2.dc1 10.244.2.141
2020-07-02T10:19:54.564Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: iocc-x-consul-server-2 10.244.2.141
2020-07-02T10:19:54.564Z [INFO] agent.server: Handled event for server in area: event=member-join server=iocc-x-consul-server-2.dc1 area=wan
2020-07-02T10:19:54.564Z [INFO] agent.server: Adding LAN server: server=“iocc-x-consul-server-2 (Addr: tcp/10.244.2.141:8300) (DC: dc1)”
2020-07-02T10:19:54.763Z [INFO] agent.server: Existing Raft peers reported by server, disabling bootstrap mode: server=iocc-x-consul-server-2
2020-07-02T10:19:54.896Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: node1.iocc-test.com 10.244.1.130
2020-07-02T10:19:57.048Z [INFO] agent: (LAN) joining: lan_addresses=[iocc-x-consul-server-0.iocc-x-consul-server.consul.svc, iocc-x-consul-server-1.iocc-x-consul-server.consul.svc, iocc-x-consul-server-2.iocc-x-consul-server.consul.svc]
2020-07-02T10:19:57.532Z [INFO] agent: (LAN) joined: number_of_nodes=3
2020-07-02T10:19:57.532Z [INFO] agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=3
2020-07-02T10:20:00.368Z [ERROR] agent: Coordinate update error: error=“No cluster leader”
2020-07-02T10:20:06.163Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“No cluster leader”
2020-07-02T10:20:28.153Z [ERROR] agent: Coordinate update error: error=“No cluster leader”
2020-07-02T10:20:36.214Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“No cluster leader”
2020-07-02T10:20:52.291Z [ERROR] agent: Coordinate update error: error=“No cluster leader”
2020-07-02T10:21:04.196Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“No cluster leader”
2020-07-02T10:21:18.507Z [ERROR] agent: Coordinate update error: error=“No cluster leader”
2020-07-02T10:21:39.368Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“No cluster leader”
2020-07-02T10:21:45.468Z [ERROR] agent: Coordinate update error: error=“No cluster leader”
2020-07-02T10:22:10.447Z [ERROR] agent: Coordinate update error: error=“No cluster leader”

Hi @jsosulska, I have tried to install it many times, and I’ve even tried to change cluster, but same issues again

Not an expert here, but i found our that if running 1 node only (not sure if this relates to cluster), i had to add the following connectInject.imageEnvoy into the yaml and then it worked. All pods went to running state.

global:
name: consul
datacenter: dc1

server:
replicas: 1
bootstrapExpect: 1
storage: 64Mi
storageClass: local-path

client:
enabled: true
grpc: true

ui:
enabled: true

connectInject:
enabled: true
imageEnvoy: envoyproxy/envoy-alpine:v1.14.2