Hello,
I have a three node vault cluster with raft storing running hashicorp/vault:1.8.0 on my EKS production cluster. In my production cluster, I have a vault agent injector running vault-k8s:0.11.0 which is succesfully mounting secrets into pods. The EKS version of this cluster is 1.22.
In my staging cluster, I then have a vault agent injector running vault-k8s:0.11.0. This connects to the production vault via its public ingress name. The EKS version of this cluster is 1.25. We have upgraded from 1.21 → 1.25 and somewhere during this upgrade has broken vault agent injecting secrets into pods.
The logs I see in the stage vault agent injector are:
2024-07-15T18:20:15.104Z [INFO] handler: Starting handler…
Listening on “:8080”…
2024-07-15T18:20:15.188Z [INFO] handler.auto-tls: Generated CA
2024-07-15T18:20:15.188Z [INFO] handler.certwatcher: Updated certificate bundle received. Updating certs…
2024-07-15T18:20:36.532Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:20:40.768Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:21:06.087Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:21:06.926Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:21:10.379Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:21:35.591Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:22:07.043Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:22:39.532Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:23:05.544Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:23:07.980Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:24:43.173Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2024-07-15T18:25:00.118Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
I have tried looking in the EKS api-server logs to look for any errors with the mutate requests but these seem to be passing as expected. Nothing has changed from either of our vault deployments other than updating the EKS version.
The mutating web hook configuration looks like this:
webhooks:
- admissionReviewVersions:
- v1beta1
- v1
clientConfig:
caBundle: {REDACTED}
service:
name: vault-agent-injector-svc
namespace: app
path: /mutate
port: 443
failurePolicy: Ignore
matchPolicy: Exact
name: vault.hashicorp.com
namespaceSelector: {}
objectSelector: {}
reinvocationPolicy: Never
rules: - apiGroups:
- “”
apiVersions: - v1
operations: - CREATE
- UPDATE
resources: - pods
scope: ‘*’
sideEffects: None
timeoutSeconds: 30
- “”
The pod where we are trying to have the secret mounted has the following annotations:
vault.hashicorp.com/agent-configmap: secrets-updater
vault.hashicorp.com/agent-inject: true
These are the same annotations used in the production vault agent injector where it is working
Does anyone where where is best to log for further errors or information? I thought the kube-api server may be the best place but didn’t see any mutate errors there. Without the vault agent giving any errors it is very difficult to troubleshoot. Setting the log level to debug also doesn’t help.