Issues joining raft storage when using -tls-server-name

I am having an issue where I can’t get nodes to join the raft when setting the -tls-server-name flag.

We are trying to use a wildcard cert form lets encrypt.

I have tried setting leader_tls_servername in
env vars with extraEnvironmentVars: in the values.yaml
in the retry_join stanza
and at the command line

I receive failures as if the flag wasn’t set.

vault operator raft join \
   -tls-server-name=* \

core: failed to get raft challenge: leader_addr= https://vault-0.vault-internal:8200 error="error during raft bootstrap init call: Put \" https://vault-0.vault-internal:8200/v1/sys/storage/raft/bootstrap/challenge\": x509: certificate is valid for *, not vault-0.vault-internal"

Our helm values file is here.
Thank for your time

# Vault Helm Chart Value Overrides
  enabled: true
  tlsDisable: false

  enabled: true

    repository: "hashicorp/vault"
    tag: "latest"
    enabled: true
    port: 8200
    failureThreshold: 2
    initialDelaySeconds: 5
    periodSeconds: 5
    successThreshold: 1
    timeoutSeconds: 3
    enabled: false
    path: "/v1/sys/health?standbyok=true"
    port: 8200
    failureThreshold: 2
    initialDelaySeconds: 60
    periodSeconds: 5
    successThreshold: 1
    timeoutSeconds: 3
    VAULT_ADDR: "https://localhost:8200"
    - name: userconfig-mgmt-vault-tls
        defaultMode: 420
        secretName: mgmt-vault-tls
    - mountPath: /vault/userconfig/mgmt-vault-tls
      name: userconfig-mgmt-vault-tls
      readOnly: true
    enabled: false
    secretName: mgmt-vault-tls
    enabled: false
      enabled: true
        enabled: true
        enabled: true
        enabled: true
      publishNotReadyAddresses: true
      externalTrafficPolicy: Local
      port: 8200
      targetPort: 8200
      annotations: {}
    enabled: true
    replicas: 3
      enabled: true
      setNodeId: true
      config: |
        ui = true
        seal "awskms" {
          region     = "us-east-1"
          kms_key_id = "alias/vault-kms-unseal-hive-mgmt"
        listener "tcp" {
          address = ""
          cluster_address = ""
          tls_cert_file = "/vault/userconfig/mgmt-vault-tls/tls.crt"
          tls_key_file  = "/vault/userconfig/mgmt-vault-tls/tls.key"
        storage "raft" {
          path = "/vault/data"
          retry_join {
            address = "https://localhost:8200"
            leader_tls_servername = "*"
            leader_api_addr = "https://vault-0.vault-internal:8200"
          retry_join {
            address = "https://localhost:8200"
            leader_tls_servername = "*"
            leader_api_addr = "https://vault-1.vault-internal:8200"
          retry_join {
            address = "https://localhost:8200"
            leader_tls_servername = "*"
            leader_api_addr = "https://vault-2.vault-internal:8200"
        disable_mlock = true
        service_registration "kubernetes" {}
    create: false
    name: "vault-kms-iam-role"
      enabled: true
# Vault UI
  enabled: true
  serviceType: "LoadBalancer"
  serviceNodePort: null
  externalPort: 8200
  externalTrafficPolicy: Local
  activeVaultPodOnly: true

  # For Added Security, edit the below
  #   - < Your IP RANGE Ex. >
  #   - < YOUR SINGLE IP Ex. >

You have not specified the version of Vault in use - there are changelog entries mentioning bug fixes related to leader_tls_servername in the past.

This seems wrong, leader_tls_servername is not a valid environment variable name for Vault. But actually, in your values.yaml content that you shared below, that’s not actually what you set, so maybe that’s not a problem.

This would not work, as the flag is specifying the TLS server name the Vault CLI should expect when sending the instruction to join to the new Vault server - and not the TLS server name the new Vault server should expect when it reaches out to find a leader.

It is not related to your problem, but I most strongly advise this is a bad configuration - you are exposing yourself to unplanned upgrades to arbitrary newer Vault versions in the future. You absolutely must not use latest here.

I am uncertain whether this environment variable, mostly used by Vault client code, would affect server-to-server communication.

address is not a valid key to have in a retry_join block.

Thanks for taking the time

To your points:
We are testing this on 1.13.1/latest. I have made the suggested change from latest to exact version. The other examples of misconfig have been cleaned up.
Thank you for better explaining those.


This does work for the client and I had added it in hopes it would also work server side. It didn’t.

What you explain here is what I am experiencing. My question is, how do I set the TLS server name the new Vault server should expect when it reaches out to find a leader?

I had initially assumed it was leader_tls_servername = "*" in the retry_join section.

I with the retry_join sections I expect to not need to run the join command manually.

It does look like the configuration in your retry_join blocks is correct to me, so it is weird that it is not taking effect.

It might even be worth opening a GitHub issue to report that, as it seems like it might be a bug.

If you were to need to trigger the join not via the configuration file, it appears the join HTTP API does support setting the servername - but that feature has not been exposed in the vault operator raft join CLI command. You could open a separate GitHub issue about that feature gap as well, if you felt like it.