Vault agent injector auto-tls isn't working when replica > 1

Hello,

Vault agent injector pods are crashing on increasing replica to >1 (when using auto-tls). i have deployed vault-agent using helm chart GitHub - hashicorp/vault-helm: Helm chart to install Vault and other associated components.. it works fine on replica = 1 but as you increase replica to 2 the new pod goes into CrashLoopBackOff state.

Environment / Versions details -
K8s version = 1.21.14
Vault helm chart version = 0.16.1
Vault server version = 1.8.3 (deployed via helm chart)
Vault agent version = 0.13.1

vault agent injector pod logs

Using internal leader elector logic for webhook certificate management                                                                                                                                 β”‚
β”‚ 2022-10-28T12:49:33.239Z [DEBUG] handler.auto-tls: Currently a follower                                                                                                                                β”‚
β”‚ 2022-10-28T12:49:33.239Z [INFO]  handler.certwatcher: Updated certificate bundle received. Updating certs...                                                                                           β”‚
β”‚ 2022-10-28T12:49:33.239Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               β”‚
β”‚ Listening on ":8080"...                                                                                                                                                                                β”‚
β”‚ 2022-10-28T12:49:33.239Z [INFO]  handler: Starting handler..                                                                                                                                           β”‚
β”‚ 2022-10-28T12:49:34.240Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               β”‚
β”‚ I1028 12:49:34.282494       1 request.go:668] Waited for 1.047085831s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/kafka.strimzi.io/v1beta2?time β”‚
β”‚ 2022-10-28T12:49:35.240Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               β”‚
β”‚ 2022-10-28T12:49:36.241Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               β”‚
β”‚ 2022-10-28T12:49:37.241Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               β”‚
β”‚ 2022-10-28T12:49:38.240Z [DEBUG] handler.auto-tls: Currently a follower                                                                                                                                β”‚
β”‚ 2022-10-28T12:49:38.242Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               β”‚
β”‚ 2022-10-28T12:49:39.242Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               β”‚
β”‚ 2022-10-28T12:49:39.814Z [ERROR] handler: http: TLS handshake error from x.x.x.x:35154: no certificate available                                                                                   β”‚
β”‚ 2022-10-28T12:49:39.814Z [ERROR] handler: http: TLS handshake error from x.x.x.x:35156: no certificate available                                                                                   β”‚
β”‚ 2022-10-28T12:49:40.242Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               β”‚
β”‚ 2022-10-28T12:49:41.242Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               
$ kubectl get pods -n vault 
NAME                                                        READY   STATUS             RESTARTS   AGE
xxxx-vault-agent-injector-7d9dcbdc6b-6xl74   0/1     CrashLoopBackOff   7          10m
xxxx-vault-agent-injector-7d9dcbdc6b-8cgcd   0/1     CrashLoopBackOff   7          10m
$ kubectl -n vault get cm/vault-k8s-leader -oyaml
apiVersion: v1
kind: ConfigMap
metadata:
  creationTimestamp: "2022-10-28T12:43:03Z"
  name: vault-k8s-leader
  namespace: vault
  ownerReferences:
  - apiVersion: v1
    kind: Pod
    name: xxxx-vault-agent-injector-7d9dcbdc6b-6xl74
    uid: 249c6d22-9412-4979-89c6-79ed6b34f0c9
  resourceVersion: "2851482587"
  uid: d1a68634-cb5e-40e0-87ea-d4eb74e9808e
kubectl describe pod xxxx-vault-agent-injector-7d9dcbdc6b-6xl74 -n vault

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Pulled     13m (x3 over 14m)    kubelet            Container image "hashicorp/vault-k8s:0.13.1" already present on machine
  Normal   Created    13m (x3 over 14m)    kubelet            Created container sidecar-injector
  Normal   Killing    13m (x2 over 14m)    kubelet            Container sidecar-injector failed liveness probe, will be restarted
  Normal   Started    13m (x3 over 14m)    kubelet            Started container sidecar-injector
  Warning  Unhealthy  13m (x6 over 14m)    kubelet            Liveness probe failed: Get "https://x.x.x.x:8080/health/ready": remote error: tls: internal error
  Warning  Unhealthy  13m (x8 over 14m)    kubelet            Readiness probe failed: Get "https://x.x.x.x:8080/health/ready": remote error: tls: internal error
  Warning  BackOff    4m7s (x44 over 13m)  kubelet            Back-off restarting failed container
1 Like

Did you manage to solve it? I’m facing the same problem. The same deployment is working fine on AWS, but getting this error on GCP.

Same issue here, deployed latest chart 0.22.1 on multiple OpenShift clusters that are on 4.8 but one few I cannot deploy injector in HA mode (all cluster share same configuration). On ones that deployed failed it looks it’s related to vault-injector-certs, on affected clusters it’s empty. Not sure why.

In my case, I managed to fix the issue by adjusting the readiness probe and liveness probe to have a longer period for the vault agent injector to react. It worked. @siwyroot