Help Needed: Vault Deployment Issue with Consul as Storage Backend in Kubernetes Cluster

Hey everyone,

I’m not sure if there’s an existing issue for this particular problem, but after weeks of troubleshooting, I decided to seek help here. I’ve been a long-time user of Consul and love the software. Recently, I decided to expand my knowledge and implement Vault in one of our demo systems, using Consul as the storage backend.

We run Consul in a Kubernetes cluster with the following configuration:

- name: consul
  namespace: {{ .Values.namespace }}
  chart: hashicorp/consul
  version: 1.1.14
  values:
    - global:
        name: consul
        datacenter: dc1
        metrics:
          enabled: true
        tls:
          enabled: true
          enableAutoEncrypt: true
          verify: true
          serverAdditionalDNSSANs:
            - "consul-server.{{ .Values.namespace }}.svc.cluster.local"
    - server:
        replicas: 1
        bootstrapExpect: 1
        disruptionBudget:
          maxUnavailable: 0
        securityContext:
          runAsNonRoot: false
          runAsUser: 0
    - ui:
        enabled: true
        service:
          enabled: true
          type: "LoadBalancer"
    - controller:
        enabled: true
    - prometheus:
        enabled: false
    - grafana:
        enabled: false
    - client:
        enabled: true
        grpc: true
    - connectInject:
        enabled: true
        default: false
        transparentProxy:
          defaultEnabled: true
    - syncCatalog:
        enabled: true
        default: false
        toConsul: true
        toK8S: false
    - apiGateway:
        managedGatewayClass:
          serviceType: LoadBalancer

As you can see, TLS is enabled.

The following configuration applies to my Vault deployment:

- name: vault
  namespace: {{ .Values.namespace }}
  chart: hashicorp/vault
  version: 0.28.1
  values:
    - affinity: ""
    - server:
        volumes:
          - name: consul-ca-cert
            secret:
              secretName: consul-ca-cert
          - name: consul-ca-key
            secret:
              secretName: consul-ca-key
          - name: consul-server-cert
            secret:
              secretName: consul-server-cert
        volumeMounts:
          - name: consul-ca-cert
            mountPath: /vault/consul/ca/cert
          - name: consul-ca-key
            mountPath: /vault/consul/ca/key
          - name: consul-server-cert
            mountPath: /vault/consul/tls/
        ha:
          enabled: true
          replicas: 2
          config: |
            ui = true
            listener "tcp" {
              tls_disable = 1
              address = "[::]:8200"
              cluster_address = "[::]:8201"
            }
            storage "consul" {
              address = "HOST_IP:8501"
              path    = "vault"
              scheme  = "https"
              tls_ca_file = "/vault/consul/ca/cert/tls.crt"
              tls_cert_file = "/vault/consul/tls/tls.crt"
              tls_key_file = "/vault/consul/tls/tls.key"
            }
            service_registration "kubernetes" {}

I’ve mounted the CA and server certificates from Consul and used them in my storage configuration as stated in the Vault documentation. However, I keep encountering this error:

2024-08-04T14:30:48.584Z [WARN]  storage migration check error: error="Get \"https://10.224.0.7:8501/v1/kv/vault/core/migration\": tls: failed to verify certificate: x509: certificate signed by unknown authority"

Despite using the Consul CA and server certificates, the error persists. I’ve verified that the certificates are in place:

/vault/consul/ca/cert/tls.crt
/vault/consul/tls/tls.crt
/vault/consul/tls/tls.key

Currently, I have tls-skip-verify enabled just to keep the system running temporarily, but this is not a long-term solution.

Can someone explain what I might be doing wrong and suggest how it would work?

Thank you!