AWS KMS auto-unseal fail after enabling TLS

Hi,

I’m hitting a weird issue with AWS KMS auto-unseal after enabling TLS

Here is what I see from the vault logs:

[INFO]  core: [DEBUG] discover-aws: Found ip addresses: [10.20.4.176 10.20.4.240 10.20.4.73 10.20.4.42]
[INFO]  core: security barrier not initialized
[INFO]  core: attempting to join possible raft leader node: leader_addr=https://10.20.4.176:8200
[INFO]  core.cluster-listener.tcp: starting listener: listener_address=10.20.4.240:8201
[INFO]  core.cluster-listener.tcp: starting listener: listener_address=127.0.0.1:8201
[INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=10.20.4.240:8201
[INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=127.0.0.1:8201
[INFO]  storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:5000000000, ElectionTimeout:5000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, Shutdow
[INFO]  storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:a14449-vault Address:a14449-vault.mylab.com:8201} {Suffrage:Voter ID:a144b0-vault Address:a144b0-vault.mylab.com:8201} {Suffr
[INFO]  core: successfully joined the raft cluster: leader_addr=""
[INFO]  storage.raft: entering follower state: follower="Node at a144f0-vault.mylab.com:8201 [Follower]" leader=
[INFO]  core: stored unseal keys supported, attempting fetch
[WARN]  failed to unseal core: error="stored unseal keys are supported, but none were found"
[INFO]  core: stored unseal keys supported, attempting fetch
[WARN]  failed to unseal core: error="stored unseal keys are supported, but none were found"
[WARN]  storage.raft: heartbeat timeout reached, starting election: last-leader=
[INFO]  storage.raft: entering candidate state: node="Node at a144f0-vault.mylab.com:8201 [Candidate]" term=2
[INFO]  storage.raft: entering follower state: follower="Node at a144f0-vault.mylab.com:8201 [Follower]" leader=
[INFO]  core: security barrier not initialized

This doesn’t always happen, and I’m not sure why…

Here is my Vault server config

listener "tcp" {
  address         = "a144f0-vault.mylab.com:8200"
  cluster_address = "a144f0-vault.mylab.com:8201"
  tls_cert_file   = "/etc/pki/tls/private/vault.crt"
  tls_key_file    = "/etc/pki/tls/private/vault.key"
}

listener "tcp" {
  address         = "127.0.0.1:8200"
  tls_disable = true
}


# HA config
storage "raft" {
  path = "/opt/vault"
  node_id = "a144f0-vault"

  retry_join {
    auto_join = "provider=aws region=us-east-1 tag_key=Name tag_value=VaultServer"
    auto_join_scheme = "https"
    leader_tls_servername = "vault.mylab.com"
    leader_client_cert_file = "/etc/pki/tls/private/vault.crt"
    leader_client_key_file = "/etc/pki/tls/private/vault.key"
  }
}

# Recommended for using integrated storage
disable_mlock = true

# cluster config
api_addr = "https://a144f0-vault.mylab.com:8200"
cluster_addr = "https://a144f0-vault.mylab.com:8201"

ui = true

# seal/unseal the vault using KMS
seal "awskms" {
  region = "us-east-1"
  kms_key_id = "<MY KMS KEY>"
}

I’m using Raft as my backend storage, and I’m following this doc (Integrated Storage | Vault by HashiCorp) and using the load balancer address as the leader_tls_servername. Is this the right config?

TLS has nothing to do with it. TLS is for incoming connections from the clients, AWS KMS is an outbound connection from Vault to AWS and is not effected by the TLS settings of the listener.

The error is saying, KMS is defined, but when I try to read the key it either doesn’t exist or doesn’t contain the master key. So either you’re missing your AWS credentials (which aren’t in the config you provided [hopefully you just redacted them]) or the key doesn’t does, or the IAM doesn’t allow access to the key, or the key is empty because you never migrated.

1 Like

Hi @aram,

Thanks for the reply! I did play around with restoring a raft snapshot earlier which might have messed up with the key.

I will try to bootstrap a new cluster and see if that fixes my issue.

So I just tried to bootstrap a brand new cluster, and the first node was bootstrapped successfully without any issues.

However, my second node is stuck at the exact same error. I’m using the same IAM instance profile for these instances, so I’m not sure why the 2nd node would have issues in looking up the unseal key…

So after more digging, I think I’m getting closer… The issue seems to be related to my DNS config which might have triggered delays in resolving the joining node hostname into its IP address.

I register all hosts with a private zone setup in Route53. Not sure where the delay is from yet. Still trying to find out more

Your PerfStandby nodes should not be trying to unseal themselves, unless you’re in a somre kub setup. AFAIK, when you bring up the cluster, there is an election and one of the nodes becomes the leader node, that’s the node that’ll open a connection to KMS to unseal the whole cluster.

If each node is contacting KMS, then the most likely issue is that they’re not acting as a “cluster” and being individual nodes. If you’re running 1.7+ you can check with vault operator raft list-peers to see the list of nodes in the cluster and their function is.

1 Like