Vault agent injector auto-tls isn't working when replica > 1

Hello,

Vault agent injector pods are crashing on increasing replica to >1 (when using auto-tls). i have deployed vault-agent using helm chart GitHub - hashicorp/vault-helm: Helm chart to install Vault and other associated components.. it works fine on replica = 1 but as you increase replica to 2 the new pod goes into CrashLoopBackOff state.

Environment / Versions details -
K8s version = 1.21.14
Vault helm chart version = 0.16.1
Vault server version = 1.8.3 (deployed via helm chart)
Vault agent version = 0.13.1

vault agent injector pod logs

Using internal leader elector logic for webhook certificate management                                                                                                                                 ā”‚
ā”‚ 2022-10-28T12:49:33.239Z [DEBUG] handler.auto-tls: Currently a follower                                                                                                                                ā”‚
ā”‚ 2022-10-28T12:49:33.239Z [INFO]  handler.certwatcher: Updated certificate bundle received. Updating certs...                                                                                           ā”‚
ā”‚ 2022-10-28T12:49:33.239Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               ā”‚
ā”‚ Listening on ":8080"...                                                                                                                                                                                ā”‚
ā”‚ 2022-10-28T12:49:33.239Z [INFO]  handler: Starting handler..                                                                                                                                           ā”‚
ā”‚ 2022-10-28T12:49:34.240Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               ā”‚
ā”‚ I1028 12:49:34.282494       1 request.go:668] Waited for 1.047085831s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/kafka.strimzi.io/v1beta2?time ā”‚
ā”‚ 2022-10-28T12:49:35.240Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               ā”‚
ā”‚ 2022-10-28T12:49:36.241Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               ā”‚
ā”‚ 2022-10-28T12:49:37.241Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               ā”‚
ā”‚ 2022-10-28T12:49:38.240Z [DEBUG] handler.auto-tls: Currently a follower                                                                                                                                ā”‚
ā”‚ 2022-10-28T12:49:38.242Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               ā”‚
ā”‚ 2022-10-28T12:49:39.242Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               ā”‚
ā”‚ 2022-10-28T12:49:39.814Z [ERROR] handler: http: TLS handshake error from x.x.x.x:35154: no certificate available                                                                                   ā”‚
ā”‚ 2022-10-28T12:49:39.814Z [ERROR] handler: http: TLS handshake error from x.x.x.x:35156: no certificate available                                                                                   ā”‚
ā”‚ 2022-10-28T12:49:40.242Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               ā”‚
ā”‚ 2022-10-28T12:49:41.242Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...                                               
$ kubectl get pods -n vault 
NAME                                                        READY   STATUS             RESTARTS   AGE
xxxx-vault-agent-injector-7d9dcbdc6b-6xl74   0/1     CrashLoopBackOff   7          10m
xxxx-vault-agent-injector-7d9dcbdc6b-8cgcd   0/1     CrashLoopBackOff   7          10m
$ kubectl -n vault get cm/vault-k8s-leader -oyaml
apiVersion: v1
kind: ConfigMap
metadata:
  creationTimestamp: "2022-10-28T12:43:03Z"
  name: vault-k8s-leader
  namespace: vault
  ownerReferences:
  - apiVersion: v1
    kind: Pod
    name: xxxx-vault-agent-injector-7d9dcbdc6b-6xl74
    uid: 249c6d22-9412-4979-89c6-79ed6b34f0c9
  resourceVersion: "2851482587"
  uid: d1a68634-cb5e-40e0-87ea-d4eb74e9808e
kubectl describe pod xxxx-vault-agent-injector-7d9dcbdc6b-6xl74 -n vault

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Pulled     13m (x3 over 14m)    kubelet            Container image "hashicorp/vault-k8s:0.13.1" already present on machine
  Normal   Created    13m (x3 over 14m)    kubelet            Created container sidecar-injector
  Normal   Killing    13m (x2 over 14m)    kubelet            Container sidecar-injector failed liveness probe, will be restarted
  Normal   Started    13m (x3 over 14m)    kubelet            Started container sidecar-injector
  Warning  Unhealthy  13m (x6 over 14m)    kubelet            Liveness probe failed: Get "https://x.x.x.x:8080/health/ready": remote error: tls: internal error
  Warning  Unhealthy  13m (x8 over 14m)    kubelet            Readiness probe failed: Get "https://x.x.x.x:8080/health/ready": remote error: tls: internal error
  Warning  BackOff    4m7s (x44 over 13m)  kubelet            Back-off restarting failed container
1 Like

Did you manage to solve it? Iā€™m facing the same problem. The same deployment is working fine on AWS, but getting this error on GCP.