Hi, I’m using helm chart deployment
Following errors are showing on consul-server-0, consul-server-1, consul-server-2
[ERROR] agent.http: Request error: method=GET url=/v1/agent/connect/ca/roots from=192.168.10.66:59128 error="No cluster leader"
[ERROR] agent.http: Request error: method=GET url=/v1/agent/connect/ca/roots from=192.168.115.28:43628 error="No cluster leader"
[ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 67f9af62-ea6f-7f10-fcf8-0ddd76e5fc66 192.168.10.68:8300}" error="dial tcp <nil>->192.168.10.68:8300: i/o timeout"
[ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter d3d1a401-c506-3cf4-4fec-3afcb7603219 192.168.115.42:8300}" error="dial tcp <nil>->192.168.115.42:8300: i/o timeout"
[WARN] agent.server.raft: Election timeout reached, restarting election
===================================
Following are the errors for consul-connect-injector-webhook-deployment and consul-controller :
Failed to load logs: container "sidecar-injector" in pod "consul-connect-injector-webhook-deployment-7cdb8b8bcf-wtnlq" is waiting to start: PodInitializing
Reason: BadRequest (400)
Using hem chart config.yaml
global:
name: consul
datacenter: dc1
gossipEncryption:
secretName: 'consul-gossip-encryption-key'
secretKey: 'key'
tls:
enabled: true
enableAutoEncrypt: true
verify: false
# acls:
# manageSystemACLs: true
server:
replicas: 3
bootstrapExpect: 3
disruptionBudget:
enabled: true
maxUnavailable: 0
updatePartition:
securityContext:
runAsNonRoot: false
runAsUser: 0
ui:
# service:
# type: "LoadBalancer"
enabled: true
connectInject:
enabled: true
controller:
enabled: true
prometheus:
enabled: true
grafana:
enabled: true
====================================
Used following commands on master node
consul tls ca create
consul tls cert create -server -dc dc1
moved agent-ca.pem, agent-ca-key, server-consul-0-key.pem and server-consul-0.pem /etc/consul.d/
copied agent-ca.pem, server-consul-0-key.pem and server-consul-0.pem to all consul servers
systemctl restart consul
consul reload
lkysow
August 24, 2021, 5:24pm
2
Hi, first of all you don’t need to run any consul tls
commands. This is all handled automatically for you.
Second, the issue right now is that the servers can’t seem to reach one another (dial tcp <nil>->192.168.10.68:8300: i/o timeout
). If you do kubectl get pods -o wide
is this the correct IP for one of the servers?
Also, is there a chance you re-installed Consul without deleting the PVCs? See our uninstall guide here: Uninstall | Consul by HashiCorp
@lkysow I removed manually created tls certs and I tired again reinstalling.
and yes, I will delete and recreate PVCs for every fresh installation.
Finally, I don’t see the IP for any pod that consul server pod trying to connect dial tcp <nil>->192.168.115.20:8300: i/o
doesn’t exists
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
consul-connect-injector-webhook-deployment-7cdb8b8bcf-828d6 0/1 Init:0/1 0 52s 192.168.112.218 tmp-k8c1w1 <none> <none>
consul-connect-injector-webhook-deployment-7cdb8b8bcf-zb9mn 0/1 Init:0/1 0 53s 192.168.10.114 tmp-k8c1w3 <none> <none>
consul-controller-6796bb8886-2wq2s 0/1 Init:0/1 0 53s 192.168.115.32 tmp-k8c1w2 <none> <none>
consul-dd5jb 0/1 Running 0 52s 192.168.115.25 tmp-k8c1w2 <none> <none>
consul-dkzv2 0/1 Running 0 53s 192.168.112.222 tmp-k8c1w1 <none> <none>
consul-server-0 0/1 Running 0 53s 192.168.115.39 tmp-k8c1w2 <none> <none>
consul-server-1 0/1 Running 0 52s 192.168.10.68 tmp-k8c1w3 <none> <none>
consul-server-2 0/1 Running 0 51s 192.168.112.230 tmp-k8c1w1 <none> <none>
consul-webhook-cert-manager-57bb5c668d-cz8dp 1/1 Running 0 53s 192.168.10.126 tmp-k8c1w3 <none> <none>
consul-xs8gp 0/1 Running 0 52s 192.168.10.119 tmp-k8c1w3 <none> <none>
prometheus-server-5cbddcc44b-kqfjf 2/2 Running 0 53s 192.168.112.213 tmp-k8c1w1 <none> <none>
Errors log from Consul-server-2 :
[ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 49c04b99-25fb-3589-fb02-375e5e5590ca 192.168.115.20:8300}" error="dial tcp <nil>->192.168.115.20:8300: i/o timeout"
[ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 0b49673d-a431-b406-e30c-f719e5f5727a 192.168.10.109:8300}" error="dial tcp <nil>->192.168.10.109:8300: i/o timeout"
[ERROR] agent: Coordinate update error: error="No cluster leader"
Error log from Consul-client pod:
[ERROR] agent.auto_config: AutoEncrypt.Sign RPC failed: addr=192.168.115.39:8300 error="rpcinsecure error making call: No cluster leader"
[ERROR] agent.auto_config: AutoEncrypt.Sign RPC failed: addr=192.168.10.68:8300 error="rpcinsecure error making call: No cluster leader"
[ERROR] agent.auto_config: AutoEncrypt.Sign RPC failed: addr=192.168.112.230:8300 error="rpcinsecure error making call: No cluster leader"
[ERROR] agent.auto_config: No servers successfully responded to the auto-encrypt request
Error log from consul-controller and consul-connect-injector-webhook-deployment
Failed to load logs: container "controller" in pod "consul-controller-6796bb8886-2wq2s" is waiting to start: PodInitializing
Reason: BadRequest (400)
lkysow
August 24, 2021, 9:50pm
4
Hmm, I’m very curious where those IPs are coming from then. This is typically only seen in situations where there are old PVCs.
Fixed issue by deleting manually all the files in persistent volume folders in /mnt/data/pv
1 Like