Got 403 Permission Denied issue while deploying Vault in Kubernetes

Got two types of strange situations when I deploy Vault in Kubernetes and using Kubernetes Auth method.
Kubernetes version: v1.25.6
Vault version: v1.12.1

1. It kept getting 403 permission denied from /v1/auth/kubernetes/login for about 30 minutes long time before suddenly got desired secrets successfully at vault-agent-init stage. Sometime it never got success after even several hours.

Error:

==> Vault agent started! Log data will stream in below:

==> Vault agent configuration:

                     Cgo: disabled
               Log Level: info
                 Version: Vault v1.12.1, built 2022-10-27T12:32:05Z
             Version Sha: e34f8a14fb7a88af4640b09f3ddbb5646b946d9c

2023-04-03T15:42:38.374Z [INFO]  sink.file: creating file sink
2023-04-03T15:42:38.374Z [INFO]  sink.file: file sink configured: path=/home/vault/.vault-token mode=-rw-r-----
2023-04-03T15:42:38.374Z [INFO]  template.server: starting template server
2023-04-03T15:42:38.375Z [INFO] (runner) creating new runner (dry: false, once: false)
2023-04-03T15:42:38.374Z [INFO]  sink.server: starting sink server
2023-04-03T15:42:38.375Z [INFO]  auth.handler: starting auth handler
2023-04-03T15:42:38.375Z [INFO]  auth.handler: authenticating
2023-04-03T15:42:38.375Z [INFO] (runner) creating watcher
2023-04-03T15:42:38.381Z [ERROR] auth.handler: error authenticating:
  error=
  | Error making API request.
  | 
  | URL: PUT http://vault.vault.svc:8200/v1/auth/kubernetes/login
  | Code: 403. Errors:
  | 
  | * permission denied
   backoff=1s
2023-04-03T15:42:39.381Z [INFO]  auth.handler: authenticating
2023-04-03T15:42:39.383Z [ERROR] auth.handler: error authenticating:
  error=
  | Error making API request.
  | 
  | URL: PUT http://vault.vault.svc:8200/v1/auth/kubernetes/login
  | Code: 403. Errors:
  | 
  | * permission denied
   backoff=1.62s

2. Sometime it got authenticated at /v1/auth/kubernetes/login soon, but then threw the error like:

# (Can't get the output now, but something like this:)
vault.read(myapp/data/postgres/config), vault.read(myapp/data/postgres/config)

URL: GET http://vault.vault.svc:8200/v1/myapp/data/postgres/config
Code: 403. Errors:

* permission denied

How I installed Vault in namespace vault:

helm vaules.yaml:

ui:
  enabled: true

server:
  logLevel: trace

  ha:
    enabled: true
    replicas: 3
  
    raft:
      enabled: true
  
  dataStorage:
    storageClass: cstor-csi
  
  auditStorage:
    storageClass: cstor-csi
  
  authDelegator:
    enabled: true

injector:
  enabled: true
  logLevel: trace
helm repo add hashicorp https://helm.releases.hashicorp.com
helm repo update
# Install a spceified version vault in namespace `vault`.
helm upgrade --install vault hashicorp/vault --namespace vault -f vault-values.yaml --version 0.23.0 --create-namespace

# Unseal
kubectl exec -ti vault-0 -n vault -- vault operator init > keys.txt
kubectl exec -ti vault-1 -n vault -- vault operator init >> keys.txt
kubectl exec -ti vault-2 -n vault -- vault operator init >> keys.txt
kubectl exec -ti vault-0 -n vault -- vault operator unseal
kubectl exec -ti vault-1 -n vault -- vault operator unseal
kubectl exec -ti vault-2 -n vault -- vault operator unseal

kubectl exec -it vault-0 -n vault -- /bin/sh

vault login

vault auth enable kubernetes

# Do this as document says:
vault write auth/kubernetes/config \
    kubernetes_host="https://$KUBERNETES_PORT_443_TCP_ADDR:443"

vault secrets enable -path=myapp kv-v2

vault kv put myapp/postgres/config POSTGRES_DB="myapp" POSTGRES_USER="myapp" POSTGRES_PASSWORD="myapp"

vault kv get myapp/postgres/config

vault policy write myapp - <<EOF
path "myapp/data/postgres/config" {
  capabilities = ["read"]
}
EOF

vault write auth/kubernetes/role/myapp \
    bound_service_account_names=myapp-sa \
    bound_service_account_namespaces=myapp \
    policies=myapp \
    ttl=3d

# create sa myapp-sa in namespace myapp, not the namespace with vault.
kubectl create sa myapp-sa -n myapp

Deployemnt of myapp in namespace myapp

# I don't know if this ClusterRoleBinding needed, but both are the same.
# apiVersion: rbac.authorization.k8s.io/v1
# kind: ClusterRoleBinding
# metadata:
#   name: myapp-sa-rbac
#   namespace: myapp
# roleRef:
#   apiGroup: rbac.authorization.k8s.io
#   kind: ClusterRole
#   name: system:auth-delegator
# subjects:
# - kind: ServiceAccount
#   name: myapp-sa
#   namespace: myapp
---
apiVersion: v1
kind: Service
metadata:
  name: myapp
  namespace: myapp
spec:
  type: ClusterIP
  ports:
    - name: myapp
      port: 8080
      targetPort: 8080
  selector:
    app: myapp
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: myapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
      annotations:
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/agent-inject-status: 'update'
        vault.hashicorp.com/role: "myapp"
        vault.hashicorp.com/agent-inject-secret-database-config: "myapp/data/postgres/config"
        # Environment variable export template
        vault.hashicorp.com/agent-inject-template-database-config: |
          {{ with secret "myapp/data/postgres/config" -}}
          export POSTGRES_DB="{{ .Data.data.POSTGRES_DB }}"
          export POSTGRES_USER="{{ .Data.data.POSTGRES_USER }}"
          export POSTGRES_PASSWORD="{{ .Data.data.POSTGRES_PASSWORD }}"
          {{- end }}
    spec:
      serviceAccountName: myapp-sa
      containers:
        - name: myapp
          image: nginx:latest
          imagePullPolicy: Always
          ports:
            - containerPort: 8080
          command: ["sh", "-c"]
          args:
            - . /vault/secrets/database-config
          env:
            - name: POSTGRES_HOST
              value: postgres
            - name: POSTGRES_PORT
              value: "5432"

By configuring it like this, only the ServiceAccount used to run Vault’s server Pods needs the ClusterRoleBinding to system:auth-delegator. You can check if that’s the case in your cluster. I thought there was a bug with the chart that could have prevented that from happening but edited my message after confirming it wasn’t true.

No, shouldn’t be needed.

Probably good to check Vault server Pods’ logs for errors as well.

Thank you or answering. I deleted that ClusterRoleBinding, but still the same.

And I checked the log of vault pods, there are very few informations:

pod vault-0, vault-1, vault-2:

2023-04-04T02:12:08.788Z [WARN]  core.raft: skipping new raft TLS config creation, keys are pending                                                                                                                                                                                            
2023-04-04T02:15:31.081Z [TRACE] activity: writing segment on timer expiration

pod vault-agent-injector-59b9c84fd8-lm5cj:

2023-04-04T02:14:46.757Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s

That’s strange. When you get the 403 errors on the client, the server should log something about it too.

Yes, I think the same, but even I set the log level to trace, it still logs a few things.

Fixed:

We should do raft join rather than vault operator init for three times:

# Join the vault-1 pod to the Raft cluster. 
kubectl exec -ti vault-1 -- vault operator raft join http://vault-0.vault-internal:8200 

# Join the vault-2 pod to the Raft cluster. 
kubectl exec -ti vault-2 -- vault operator raft join http://vault-0.vault-internal:8200
1 Like