Vault agent injector auto-tls isn't working when replica > 1

msalman899 · October 28, 2022, 1:13pm

Hello,

Vault agent injector pods are crashing on increasing replica to >1 (when using auto-tls). i have deployed vault-agent using helm chart GitHub - hashicorp/vault-helm: Helm chart to install Vault and other associated components.. it works fine on replica = 1 but as you increase replica to 2 the new pod goes into CrashLoopBackOff state.

Environment / Versions details -
K8s version = 1.21.14
Vault helm chart version = 0.16.1
Vault server version = 1.8.3 (deployed via helm chart)
Vault agent version = 0.13.1

vault agent injector pod logs

Using internal leader elector logic for webhook certificate management                                                                                                                                 │
│ 2022-10-28T12:49:33.239Z [DEBUG] handler.auto-tls: Currently a follower                                                                                                                                │
│ 2022-10-28T12:49:33.239Z [INFO]  handler.certwatcher: Updated certificate bundle received. Updating certs...                                                                                           │
│ 2022-10-28T12:49:33.239Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               │
│ Listening on ":8080"...                                                                                                                                                                                │
│ 2022-10-28T12:49:33.239Z [INFO]  handler: Starting handler..                                                                                                                                           │
│ 2022-10-28T12:49:34.240Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               │
│ I1028 12:49:34.282494       1 request.go:668] Waited for 1.047085831s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/kafka.strimzi.io/v1beta2?time │
│ 2022-10-28T12:49:35.240Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               │
│ 2022-10-28T12:49:36.241Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               │
│ 2022-10-28T12:49:37.241Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               │
│ 2022-10-28T12:49:38.240Z [DEBUG] handler.auto-tls: Currently a follower                                                                                                                                │
│ 2022-10-28T12:49:38.242Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               │
│ 2022-10-28T12:49:39.242Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               │
│ 2022-10-28T12:49:39.814Z [ERROR] handler: http: TLS handshake error from x.x.x.x:35154: no certificate available                                                                                   │
│ 2022-10-28T12:49:39.814Z [ERROR] handler: http: TLS handshake error from x.x.x.x:35156: no certificate available                                                                                   │
│ 2022-10-28T12:49:40.242Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               │
│ 2022-10-28T12:49:41.242Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...

$ kubectl get pods -n vault 
NAME                                                        READY   STATUS             RESTARTS   AGE
xxxx-vault-agent-injector-7d9dcbdc6b-6xl74   0/1     CrashLoopBackOff   7          10m
xxxx-vault-agent-injector-7d9dcbdc6b-8cgcd   0/1     CrashLoopBackOff   7          10m

$ kubectl -n vault get cm/vault-k8s-leader -oyaml
apiVersion: v1
kind: ConfigMap
metadata:
  creationTimestamp: "2022-10-28T12:43:03Z"
  name: vault-k8s-leader
  namespace: vault
  ownerReferences:
  - apiVersion: v1
    kind: Pod
    name: xxxx-vault-agent-injector-7d9dcbdc6b-6xl74
    uid: 249c6d22-9412-4979-89c6-79ed6b34f0c9
  resourceVersion: "2851482587"
  uid: d1a68634-cb5e-40e0-87ea-d4eb74e9808e

kubectl describe pod xxxx-vault-agent-injector-7d9dcbdc6b-6xl74 -n vault

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Pulled     13m (x3 over 14m)    kubelet            Container image "hashicorp/vault-k8s:0.13.1" already present on machine
  Normal   Created    13m (x3 over 14m)    kubelet            Created container sidecar-injector
  Normal   Killing    13m (x2 over 14m)    kubelet            Container sidecar-injector failed liveness probe, will be restarted
  Normal   Started    13m (x3 over 14m)    kubelet            Started container sidecar-injector
  Warning  Unhealthy  13m (x6 over 14m)    kubelet            Liveness probe failed: Get "https://x.x.x.x:8080/health/ready": remote error: tls: internal error
  Warning  Unhealthy  13m (x8 over 14m)    kubelet            Readiness probe failed: Get "https://x.x.x.x:8080/health/ready": remote error: tls: internal error
  Warning  BackOff    4m7s (x44 over 13m)  kubelet            Back-off restarting failed container

ercindemir0 · November 18, 2022, 5:00pm

Did you manage to solve it? I’m facing the same problem. The same deployment is working fine on AWS, but getting this error on GCP.

siwyroot · November 28, 2022, 3:14pm

Same issue here, deployed latest chart 0.22.1 on multiple OpenShift clusters that are on 4.8 but one few I cannot deploy injector in HA mode (all cluster share same configuration). On ones that deployed failed it looks it’s related to vault-injector-certs, on affected clusters it’s empty. Not sure why.

ercindemir0 · December 13, 2022, 1:13am

In my case, I managed to fix the issue by adjusting the readiness probe and liveness probe to have a longer period for the vault agent injector to react. It worked. @siwyroot

Topic		Replies	Views
Vault agent injector throws error 'tls: bad certificate' after each 24 hours Vault vault	0	2253	September 22, 2022
[ERROR] handler: http: TLS handshake error from 10.60.158.112:33278: remote error: tls: bad certificate Vault	0	1903	July 3, 2022
Vault-agent-injector fails if replicas > 1 Vault vault	2	259	October 2, 2023
Vault agent injector stuck at updating cert Vault k8s , vault	1	315	November 30, 2023
Vault agent injector gets TLS handshake error \| certs not auto generated Vault	0	29	August 28, 2024

Vault agent injector auto-tls isn't working when replica > 1

Related topics