TLS errors in K8 using helm chart deployment

Hi, I’m using helm chart deployment

Following errors are showing on consul-server-0, consul-server-1, consul-server-2

[ERROR] agent.http: Request error: method=GET url=/v1/agent/connect/ca/roots from= error="No cluster leader"
[ERROR] agent.http: Request error: method=GET url=/v1/agent/connect/ca/roots from= error="No cluster leader"
[ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 67f9af62-ea6f-7f10-fcf8-0ddd76e5fc66}" error="dial tcp <nil>-> i/o timeout"
[ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter d3d1a401-c506-3cf4-4fec-3afcb7603219}" error="dial tcp <nil>-> i/o timeout"
[WARN]  agent.server.raft: Election timeout reached, restarting election
Following are the errors for consul-connect-injector-webhook-deployment and  consul-controller :

Failed to load logs: container "sidecar-injector" in pod "consul-connect-injector-webhook-deployment-7cdb8b8bcf-wtnlq" is waiting to start: PodInitializing
Reason: BadRequest (400)

Using hem chart config.yaml

  name: consul
  datacenter: dc1
    secretName: 'consul-gossip-encryption-key'
    secretKey: 'key'
    enabled: true
    enableAutoEncrypt: true
    verify: false
#  acls:
#    manageSystemACLs: true
  replicas: 3
  bootstrapExpect: 3
    enabled: true
    maxUnavailable: 0
    runAsNonRoot: false
    runAsUser: 0
# service:
#    type: "LoadBalancer"
  enabled: true
  enabled: true
  enabled: true
  enabled: true
  enabled: true

Used following commands on master node
consul tls ca create

consul tls cert create -server -dc dc1

moved agent-ca.pem, agent-ca-key, server-consul-0-key.pem and server-consul-0.pem /etc/consul.d/

copied agent-ca.pem, server-consul-0-key.pem and server-consul-0.pem to all consul servers

systemctl restart consul
consul reload

Hi, first of all you don’t need to run any consul tls commands. This is all handled automatically for you.

Second, the issue right now is that the servers can’t seem to reach one another (dial tcp <nil>-> i/o timeout). If you do kubectl get pods -o wide is this the correct IP for one of the servers?

Also, is there a chance you re-installed Consul without deleting the PVCs? See our uninstall guide here: Uninstall | Consul by HashiCorp

@lkysow I removed manually created tls certs and I tired again reinstalling.

and yes, I will delete and recreate PVCs for every fresh installation.

Finally, I don’t see the IP for any pod that consul server pod trying to connect dial tcp <nil>-> i/o doesn’t exists

kubectl get pods -o wide

NAME                                                          READY   STATUS     RESTARTS   AGE   IP                NODE         NOMINATED NODE   READINESS GATES
consul-connect-injector-webhook-deployment-7cdb8b8bcf-828d6   0/1     Init:0/1   0          52s   tmp-k8c1w1   <none>           <none>
consul-connect-injector-webhook-deployment-7cdb8b8bcf-zb9mn   0/1     Init:0/1   0          53s    tmp-k8c1w3   <none>           <none>
consul-controller-6796bb8886-2wq2s                            0/1     Init:0/1   0          53s    tmp-k8c1w2   <none>           <none>
consul-dd5jb                                                  0/1     Running    0          52s    tmp-k8c1w2   <none>           <none>
consul-dkzv2                                                  0/1     Running    0          53s   tmp-k8c1w1   <none>           <none>
consul-server-0                                               0/1     Running    0          53s    tmp-k8c1w2   <none>           <none>
consul-server-1                                               0/1     Running    0          52s     tmp-k8c1w3   <none>           <none>
consul-server-2                                               0/1     Running    0          51s   tmp-k8c1w1   <none>           <none>
consul-webhook-cert-manager-57bb5c668d-cz8dp                  1/1     Running    0          53s    tmp-k8c1w3   <none>           <none>
consul-xs8gp                                                  0/1     Running    0          52s    tmp-k8c1w3   <none>           <none>
prometheus-server-5cbddcc44b-kqfjf                            2/2     Running    0          53s   tmp-k8c1w1   <none>           <none>

Errors log from Consul-server-2 :

[ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 49c04b99-25fb-3589-fb02-375e5e5590ca}" error="dial tcp <nil>-> i/o timeout"

[ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 0b49673d-a431-b406-e30c-f719e5f5727a}" error="dial tcp <nil>-> i/o timeout"

[ERROR] agent: Coordinate update error: error="No cluster leader"

Error log from Consul-client pod:

[ERROR] agent.auto_config: AutoEncrypt.Sign RPC failed: addr= error="rpcinsecure error making call: No cluster leader"
[ERROR] agent.auto_config: AutoEncrypt.Sign RPC failed: addr= error="rpcinsecure error making call: No cluster leader"
[ERROR] agent.auto_config: AutoEncrypt.Sign RPC failed: addr= error="rpcinsecure error making call: No cluster leader"
[ERROR] agent.auto_config: No servers successfully responded to the auto-encrypt request

Error log from consul-controller and consul-connect-injector-webhook-deployment

Failed to load logs: container "controller" in pod "consul-controller-6796bb8886-2wq2s" is waiting to start: PodInitializing

Reason: BadRequest (400)

Hmm, I’m very curious where those IPs are coming from then. This is typically only seen in situations where there are old PVCs.

Fixed issue by deleting manually all the files in persistent volume folders in /mnt/data/pv

1 Like