Vault deployment on Private AKS Cluster

I am attempting to deploy Vault to a private AKS cluster via the helm chart (I can deploy the exact same chart/values to a non-private AKS Cluster with no issues). The chart will deploy but initialization fails with the connection refused error. I am sure there’s something I am missing with this being a private AKS cluster but I can’t seem to place the problem.

kubectl exec -it vault-1 -- vault operator init
Error initializing: Put "http://127.0.0.1:8200/v1/sys/init": dial tcp 127.0.0.1:8200: connect: connection refused

ADO Pipeline Deployment:
helm install --create-namespace --namespace $(namespace) --values $(build.stagingdirectory)/overrides/overrides.yaml --wait

Overrides.yaml

server:
  auditStorage:
    accessMode: ReadWriteOnce
    annotations: {}
    enabled: true
    mountPath: /vault/audit
    size: 10Gi
    storageClass:
  extraEnvironmentVars:
  extraVolumes:
  affinity: |
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app.kubernetes.io/name: vault           
              component: server
          topologyKey: kubernetes.io/hostname
      
  ha:
    enabled: true
    raft:
      enabled: true
      replicas: 3
      config: |
          ui = true
          listener "tcp" {
            tls_disable = 1
            address = "[::]:8200"
            cluster_address = "[::]:8201"
          }
          storage "raft" {
            path = "/vault/data"
              retry_join {
                leader_api_addr = "http://vault-0.vault-internal:8200"
              }
              retry_join {
                leader_api_addr = "http://vault-1.vault-internal:8200"
              }
              retry_join {
                leader_api_addr = "http://vault-2.vault-internal:8200"
              }          
          }
          seal "azurekeyvault" {
            tenant_id       = "XXXXXX"
            client_id       = "XXXXXXX"
            client_secret   = "XXXXXX"         
            vault_name      = "XXXXXX"
            key_name        = "vault-k8s-unsealer-key"
            subscription_id = "XXXXXX"
          }
          autopilot {
            cleanup_dead_servers = "true"
            last_contact_threshold = "200ms"
            last_contact_failure_threshold = "10m"
            max_trailing_logs = 250000
            min_quorum = 5
            server_stabilization_time = "10s"
          }
          service_registration "kubernetes" {}          
  image:
    repository: "hashicorp/vault"
    tag: "1.10.1"
  livenessProbe:
    enabled: true
    initialDelaySeconds: 60
    path: "/v1/sys/health?standbyok=true"
  readinessProbe:
    enabled: true
    path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204" 
ui:
  enabled: true
  externalPort: 80
  serviceNodePort:
  serviceType: LoadBalancer

I have even verified that the overrides are getting added to the config properly

kubectl exec -it vault-1  -- cat vault/config/extraconfig-from-values.hcl
disable_mlock = true
ui = true
listener "tcp" {
  tls_disable = 1
  address = "[::]:8200"
  cluster_address = "[::]:8201"
}
storage "raft" {
  path = "/vault/data"
    retry_join {
      leader_api_addr = "http://vaultdev-0.vault-internal:8200"
    }
    retry_join {
      leader_api_addr = "http://vaultdev-1.vault-internal:8200"
    }
    retry_join {
      leader_api_addr = "http://vaultdev-2.vault-internal:8200"
    }
}
seal "azurekeyvault" {
  tenant_id       = "XXXXXX"
  client_id       = "XXXXXX"
  client_secret   = "XXXXXX"
  vault_name      = "XXXXXX"
  key_name        = "vault-dev-k8s-unsealer-key"
  subscription_id = "XXXXXX"
}
autopilot {
  cleanup_dead_servers = "true"
  last_contact_threshold = "200ms"
  last_contact_failure_threshold = "10m"
  max_trailing_logs = 250000
  min_quorum = 5
  server_stabilization_time = "10s"
}

There is a serious bug in this version, upgrade immediately!

OK, time for some debugging, then.

Is the process running at all?

What do the logs say?

What ports are listening inside the container? (netstat -tln, if it’s available in this container)

I updated the image to 1.10.3

Also, I had to take out the probes to get the container to not keep restarting and got these logs

2022-05-23T19:12:39.622Z [INFO]  proxy environment: http_proxy="" https_proxy="" no_proxy=""
Error parsing Seal configuration: error fetching Azure Key Vault wrapper key information: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://kv-ececd-hub03.vault.azure.net/keys/vault-dev-k8s-unsealer-key/?api-version=7.0: StatusCode=0 -- Original Error: adal: Failed to execute the refresh request. Error = 'Post "https://login.microsoftonline.com/04aa6bf4-d436-426f-bfa4-04b7a70e60ff/oauth2/token?api-version=1.0": dial tcp 20.190.151.8:443: i/o timeout'

It seems like for some reason the AKS cluster cannot connect to the azure key vault to pull the unseal key.