Vault Agent Injector not injecting secrets into EKS pods

Hello,

I have a three node vault cluster with raft storing running hashicorp/vault:1.8.0 on my EKS production cluster. In my production cluster, I have a vault agent injector running vault-k8s:0.11.0 which is succesfully mounting secrets into pods. The EKS version of this cluster is 1.22.

In my staging cluster, I then have a vault agent injector running vault-k8s:0.11.0. This connects to the production vault via its public ingress name. The EKS version of this cluster is 1.25. We have upgraded from 1.21 → 1.25 and somewhere during this upgrade has broken vault agent injecting secrets into pods.

The logs I see in the stage vault agent injector are:
2024-07-15T18:20:15.104Z [INFO] handler: Starting handler…
Listening on “:8080”…
2024-07-15T18:20:15.188Z [INFO] handler.auto-tls: Generated CA
2024-07-15T18:20:15.188Z [INFO] handler.certwatcher: Updated certificate bundle received. Updating certs…
2024-07-15T18:20:36.532Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:20:40.768Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:21:06.087Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:21:06.926Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:21:10.379Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:21:35.591Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:22:07.043Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:22:39.532Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:23:05.544Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:23:07.980Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:24:43.173Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:25:00.118Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s

I have tried looking in the EKS api-server logs to look for any errors with the mutate requests but these seem to be passing as expected. Nothing has changed from either of our vault deployments other than updating the EKS version.

The mutating web hook configuration looks like this:
webhooks:

  • admissionReviewVersions:
    • v1beta1
    • v1
      clientConfig:
      caBundle: {REDACTED}
      service:
      name: vault-agent-injector-svc
      namespace: app
      path: /mutate
      port: 443
      failurePolicy: Ignore
      matchPolicy: Exact
      name: vault.hashicorp.com
      namespaceSelector: {}
      objectSelector: {}
      reinvocationPolicy: Never
      rules:
    • apiGroups:
      • “”
        apiVersions:
      • v1
        operations:
      • CREATE
      • UPDATE
        resources:
      • pods
        scope: ‘*’
        sideEffects: None
        timeoutSeconds: 30

The pod where we are trying to have the secret mounted has the following annotations:
vault.hashicorp.com/agent-configmap: secrets-updater
vault.hashicorp.com/agent-inject: true

These are the same annotations used in the production vault agent injector where it is working

Does anyone where where is best to log for further errors or information? I thought the kube-api server may be the best place but didn’t see any mutate errors there. Without the vault agent giving any errors it is very difficult to troubleshoot. Setting the log level to debug also doesn’t help.

Change to service tokens was from EKS 1.23 → 1.24. As a service token was already mapped to system account before EKS upgrade, it would remain in place after is my understanding.

Does anyone have some curl requests that can provide if service account is valid to the vault cluster running in the production environment?