Ahoy community,
I have an issue that I had seen happened in the past to others , but could not find a solution to it in my env.
Story: I have a new installation of Docker (20.10.7) and k8s (v1.21.3) on centos (release 7.9.2009). helm is “v3.6.3”.
This is a one node k8s cluster:
[root@home3vm3 deploy]# k get nodes
NAME STATUS ROLES AGE VERSION
home3vm3 Ready control-plane,master 5h10m v1.21.3
I try to run the procedure “Getting Started with Consul Service Mesh for Kubernetes” (Getting Started with Consul Service Mesh for Kubernetes | Consul - HashiCorp Learn) and all pods are in running except for pod consul-connect-injector-webhook-deployment that shows state of CrashLoopBackOff.
k get pods:
NAME READY STATUS RESTARTS AGE
consul-b5jv8 1/1 Running 0 3h21m
consul-connect-injector-webhook-deployment-77b574c5cc-mkw9s 0/1 CrashLoopBackOff 76 3h7m
consul-controller-5788b8f6c7-khs5f 1/1 Running 0 3h21m
consul-server-0 1/1 Running 0 3h21m
consul-webhook-cert-manager-5745cbb9d-ztpnv 1/1 Running 0 3h21m
Logs for this pod:
[root@home3vm3 deploy]# k logs consul-connect-injector-webhook-deployment-77b574c5cc-mkw9s
Listening on “:8080”…
Error loading TLS keypair: tls: failed to find any PEM data in certificate input
2021/07/28 04:22:37 http: TLS handshake error from 10.244.0.1:39872: No certificate available.
Error loading TLS keypair: tls: failed to find any PEM data in certificate input
2021/07/28 04:22:37 http: TLS handshake error from 10.244.0.1:39870: No certificate available.
terminated received, shutting down
Error listening: http: Server closed
E0728 04:22:38.066550 1 controller.go:124] error syncing cache
E0728 04:22:38.066560 1 controller.go:124] error syncing cache
2021-07-28T04:22:38.165Z [ERROR] healthCheckResource: unable to get pods: err=“Get “https://10.96.0.1:443/api/v1/pods?labelSelector=consul.hashicorp.com%2Fconnect-inject-status”: context canceled”
2021-07-28T04:22:38.165Z [INFO] healthCheckResource: received stop signal, shutting down
2021-07-28T04:22:38.265Z [ERROR] cleanupResource: unable to get nodes: error=“Get “https://10.96.0.1:443/api/v1/nodes”: context canceled”
2021-07-28T04:22:38.265Z [INFO] cleanupResource: received stop signal, shutting down
Describe for this pod:
[root@home3vm3 deploy]# k describe pod consul-connect-injector-webhook-deployment-77b574c5cc-mkw9s
Name: consul-connect-injector-webhook-deployment-77b574c5cc-mkw9s
Namespace: default
Priority: 0
Node: home3vm3/192.168.2.246
Start Time: Tue, 27 Jul 2021 18:13:46 -0700
Labels: app=consul
chart=consul-helm
component=connect-injector
pod-template-hash=77b574c5cc
release=consul
Annotations: consul.hashicorp.com/connect-inject: false
Status: Running
IP: 10.244.0.18
IPs:
IP: 10.244.0.18
Controlled By: ReplicaSet/consul-connect-injector-webhook-deployment-77b574c5cc
Containers:
sidecar-injector:
Container ID: docker://19540028abadc3e387087618f5604bb1800f08b985510b150f0e7b9dd75cf600
Image: hashicorp/consul-k8s:0.25.0
Image ID: docker-pullable://hashicorp/consul-k8s@sha256:66a1dfd964e9a8fe2477803462fd08cb83744a65f2b8083e1c51c580f6930c7d
Port:
Host Port:
Command:
/bin/sh
-ec
CONSUL_FULLNAME=“consul”
consul-k8s inject-connect \
-default-inject=true \
-consul-image="hashicorp/consul:1.9.7" \
-envoy-image="envoyproxy/envoy:v1.16.4" \
-consul-k8s-image="hashicorp/consul-k8s:0.25.0" \
-listen=:8080 \
-log-level=info \
-enable-health-checks-controller=true \
-health-checks-reconcile-period=1m \
-cleanup-controller-reconcile-period=5m \
-default-enable-metrics=false \
-default-enable-metrics-merging=false \
-default-merged-metrics-port=20100 \
-default-prometheus-scrape-port=20200 \
-default-prometheus-scrape-path="/metrics" \
-allow-k8s-namespace="*" \
-tls-auto=${CONSUL_FULLNAME}-connect-injector-cfg \
-tls-auto-hosts=${CONSUL_FULLNAME}-connect-injector-svc,${CONSUL_FULLNAME}-connect-injector-svc.${NAMESPACE},${CONSUL_FULLNAME}-connect-injector-svc.${NAMESPACE}.svc \
-init-container-memory-limit=150Mi \
-init-container-memory-request=25Mi \
-init-container-cpu-limit=50m \
-init-container-cpu-request=50m \
-consul-sidecar-memory-limit=50Mi \
-consul-sidecar-memory-request=25Mi \
-consul-sidecar-cpu-limit=20m \
-consul-sidecar-cpu-request=20m \
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 27 Jul 2021 21:22:31 -0700
Finished: Tue, 27 Jul 2021 21:22:38 -0700
Ready: False
Restart Count: 78
Limits:
cpu: 50m
memory: 50Mi
Requests:
cpu: 50m
memory: 50Mi
Liveness: http-get https://:8080/health/ready delay=1s timeout=5s period=2s #success=1 #failure=2
Readiness: http-get https://:8080/health/ready delay=2s timeout=5s period=2s #success=1 #failure=2
Environment:
NAMESPACE: default (v1:metadata.namespace)
HOST_IP: (v1:status.hostIP)
CONSUL_HTTP_ADDR: http://$(HOST_IP):8500
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-942t7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-942t7:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
Normal Started 50m (x60 over 3h9m) kubelet Started container sidecar-injector
Normal Pulled 44m (x61 over 3h10m) kubelet Container image “hashicorp/consul-k8s:0.25.0” already present on machine
Warning BackOff 4m50s (x844 over 3h9m) kubelet Back-off restarting failed container
I tried many things including changing the CNI (now it is flannel, but I had tried also with calico). I read in consul common errors page the issue probably is on CNI, but I could not figure out anything.
I will appreciate any tip on how to work on this, like how to enable more debug on this pod, or what other things to check.
Thank you,
zvik