New Vault EC2 Instance can't join Cluster although already found it

Hi We have Vault Server setup inside EC2 Instance,

we encounter condition that one of the ec2 instance replaced with new one because ec2 health check failure in ASG activity.

then we found that new ec2 instance would not join the cluster no matter what… although in the old ec2 instance its detected

new instance vault status :

Key                      Value
---                      -----
Recovery Seal Type       awskms
Initialized              true
Sealed                   true
Total Recovery Shares    0
Threshold                0
Unseal Progress          0/0
Unseal Nonce             n/a
Version                  1.11.2
Build Date               2022-07-29T09:48:47Z
Storage Type             raft
HA Enabled               true

old instance vault status

Key                      Value
---                      -----
Recovery Seal Type       shamir
Initialized              true
Sealed                   false
Total Recovery Shares    5
Threshold                3
Version                  1.11.2
Build Date               2022-07-29T09:48:47Z
Storage Type             raft
Cluster Name             vault-cluster-8e51cf81
Cluster ID               4813eebf-4254-d876-036d-d910ee0e65a2
HA Enabled               true
HA Cluster               https://10.11.36.88:8201
HA Mode                  active
Active Since             2022-09-06T17:41:28.706117295Z
Raft Committed Index     2737746
Raft Applied Index       2737745

then we try to delete the vault data in the old instance,
its try to join with the old instance, but showing this error :

Sep 06 18:11:46 ip-10-11-10-243 vault[17305]: 2022-09-06T18:11:46.423Z [INFO]  core: stored unseal keys supported, attempting fetch
Sep 06 18:11:46 ip-10-11-10-243 vault[17305]: 2022-09-06T18:11:46.424Z [WARN]  failed to unseal core: error="stored unseal keys are supported, but none were found"

in the leader / old instance :

Sep 06 18:18:06 ip-10-11-36-88 vault[22523]: 2022-09-06T18:18:06.573Z [ERROR] storage.raft: failed to heartbeat to: peer=10.11.10.243:8201 error="dial tcp 10.11.10.243:8201: connect: connection refused"
Sep 06 18:18:07 ip-10-11-36-88 vault[22523]: 2022-09-06T18:18:07.458Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter 713bd1bc-1b5a-e013-2ba6-e64f23a37ca9 10.11.17.229:8201}" error="dial tcp 10.11.17.229:8201: connect: connection refused"

we have the config below :

disable_mlock = true
ui            = true

api_addr     = "https://vault-server.internal-domain.io:8200"
cluster_addr = "https://{{ GetPrivateIP }}:8201"

listener "tcp" {
  address         = "[::]:8200"
  cluster_address = "[::]:8201"

  tls_disable        = "false"
  tls_client_ca_file = "/opt/vault/tls/ca.crt"
  tls_cert_file      = "/opt/vault/tls/tls.crt"
  tls_key_file       = "/opt/vault/tls/tls.key"

  tls_require_and_verify_client_cert = "true"

  proxy_protocol_behavior = "allow_authorized"
  proxy_protocol_authorized_addrs = [
    "10.0.0.0/8"
  ]
}

storage "raft" {
  path = "/opt/vault/data"
  retry_join {
    auto_join        = "provider=\"aws\" region=\"us-west-2\" tag_key=\"retry_join\" tag_value=\"vault-server-34141-001\" addr_type=\"private_v4\""
    auto_join_port   = 8200
    auto_join_scheme = "https"

    leader_tls_servername   = "vault"
    leader_ca_cert_file     = "/opt/vault/tls/ca.crt"
    leader_client_cert_file = "/opt/vault/tls/tls.crt"
    leader_client_key_file  = "/opt/vault/tls/tls.key"
  }

  autopilot {
    cleanup_dead_servers           = "true"
    last_contact_threshold         = "200ms"
    last_contact_failure_threshold = "10m"
    max_trailing_logs              = 250
    min_quorum                     = 3
    server_stabilization_time      = "60s"
  }
}

seal "awskms" {
  region     = "us-west-2"
  kms_key_id = "alias/vault-server-kms-key"
}

telemetry {
  prometheus_retention_time = "30s"
  disable_hostname          = true
}

is there anything I could do for the new node to join leader / old instance ?
since the new instance already try to join the old instance / cluster, but always shown failed unsealed :

Sep 06 18:19:03 ip-10-11-10-243 vault[17350]: 2022-09-06T18:19:03.499Z [INFO]  core: stored unseal keys supported, attempting fetch
Sep 06 18:19:03 ip-10-11-10-243 vault[17350]: 2022-09-06T18:19:03.499Z [WARN]  failed to unseal core: error="stored unseal keys are supported, but none were found"

New Instance full log :

Sep 06 18:40:02 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:02.954Z [INFO]  storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:15000000000, ElectionTimeout:15000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:\"5f917b7c-e84d-8b5e-825f-0370afb2b993\", NotifyCh:(chan<- bool)(0x4000bc0310), LogOutput:io.Writer(nil), LogLevel:\"DEBUG\", Logger:(*hclog.interceptLogger)(0x4000c35770), NoSnapshotRestoreOnStart:true, skipStartup:false}"
Sep 06 18:40:02 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:02.955Z [INFO]  storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:88592729-6d11-69ff-d47a-db87809717f7 Address:10.11.36.88:8201} {Suffrage:Voter ID:713bd1bc-1b5a-e013-2ba6-e64f23a37ca9 Address:10.11.17.229:8201} {Suffrage:Voter ID:e836b268-a1e3-2fb0-22f0-0cf740d55c2c Address:10.11.7.133:8201} {Suffrage:Nonvoter ID:5f917b7c-e84d-8b5e-825f-0370afb2b993 Address:10.11.10.243:8201}]"
Sep 06 18:40:02 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:02.955Z [INFO]  core: successfully joined the raft cluster: leader_addr=https://10.11.36.88:8200
Sep 06 18:40:02 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:02.955Z [INFO]  storage.raft: entering follower state: follower="Node at 10.11.10.243:8201 [Follower]" leader-address= leader-id=
Sep 06 18:40:03 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:03.121Z [WARN]  storage.raft: failed to get previous log: previous-index=2745136 last-index=1 error="log not found"
Sep 06 18:40:05 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:05.182Z [INFO]  http: TLS handshake error from 10.11.59.81:5497: EOF
Sep 06 18:40:06 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:06.481Z [WARN]  storage.raft: failed to get previous log: previous-index=2739064 last-index=1 error="log not found"
Sep 06 18:40:06 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:06.699Z [INFO]  core: stored unseal keys supported, attempting fetch
Sep 06 18:40:06 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:06.699Z [WARN]  failed to unseal core: error="stored unseal keys are supported, but none were found"
Sep 06 18:40:11 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:11.700Z [INFO]  core: stored unseal keys supported, attempting fetch
Sep 06 18:40:11 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:11.700Z [WARN]  failed to unseal core: error="stored unseal keys are supported, but none were found"
Sep 06 18:40:12 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:12.070Z [INFO]  http: TLS handshake error from 10.11.7.91:2938: EOF
Sep 06 18:40:12 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:12.918Z [INFO]  http: TLS handshake error from 10.11.7.91:49652: EOF
Sep 06 18:40:15 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:15.181Z [INFO]  http: TLS handshake error from 10.11.59.81:20852: EOF
Sep 06 18:40:16 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:16.701Z [INFO]  core: stored unseal keys supported, attempting fetch
Sep 06 18:40:16 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:16.701Z [WARN]  failed to unseal core: error="stored unseal keys are supported, but none were found"
Sep 06 18:40:21 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:21.702Z [INFO]  core: stored unseal keys supported, attempting fetch
Sep 06 18:40:21 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:21.702Z [WARN]  failed to unseal core: error="stored unseal keys are supported, but none were found"
Sep 06 18:40:22 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:22.070Z [INFO]  http: TLS handshake error from 10.11.7.91:28835: EOF
Sep 06 18:40:22 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:22.918Z [INFO]  http: TLS handshake error from 10.11.7.91:4524: EOF
Sep 06 18:40:25 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:25.182Z [INFO]  http: TLS handshake error from 10.11.59.81:46091: EOF
Sep 06 18:40:26 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:26.703Z [INFO]  core: stored unseal keys supported, attempting fetch
Sep 06 18:40:26 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:26.703Z [WARN]  failed to unseal core: error="stored unseal keys are supported, but none were found"
Sep 06 18:40:31 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:31.704Z [INFO]  core: stored unseal keys supported, attempting fetch
Sep 06 18:40:31 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:31.705Z [WARN]  failed to unseal core: error="stored unseal keys are supported, but none were found"
Sep 06 18:40:32 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:32.070Z [INFO]  http: TLS handshake error from 10.11.7.91:57071: EOF
Sep 06 18:40:32 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:32.918Z [INFO]  http: TLS handshake error from 10.11.7.91:18522: EOF
Sep 06 18:40:35 ip-10-11-10-243 vault[17422]: 2022-09-06T18:40:35.182Z [INFO]  http: TLS handshake error from 10.11.59.81:35948: EOF

Update :

  • After changing to new instance again, Found interesting logs that it attempt to challenge to itself, based on what I assume in the logs below,
    Although I believe it need to challenge to the cluster leader IP address, anyone can point me whats wrong with my configuration?
Sep 06 20:20:41 ip-10-11-13-5 vault[1762]: 2022-09-06T20:20:41.152Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://10.11.36.88:8200
Sep 06 20:20:41 ip-10-11-13-5 vault[1762]: 2022-09-06T20:20:41.153Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://10.11.13.5:8200
Sep 06 20:20:41 ip-10-11-13-5 vault[1762]: 2022-09-06T20:20:41.153Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://10.11.17.229:8200
Sep 06 20:20:41 ip-10-11-13-5 vault[1762]: 2022-09-06T20:20:41.159Z [ERROR] core: failed to get raft challenge: leader_addr=https://10.11.13.5:8200
Sep 06 20:20:41 ip-10-11-13-5 vault[1762]:   error=
Sep 06 20:20:41 ip-10-11-13-5 vault[1762]:   | error during raft bootstrap init call: Error making API request.
Sep 06 20:20:41 ip-10-11-13-5 vault[1762]:   |
Sep 06 20:20:41 ip-10-11-13-5 vault[1762]:   | URL: PUT https://10.11.13.5:8200/v1/sys/storage/raft/bootstrap/challenge
Sep 06 20:20:41 ip-10-11-13-5 vault[1762]:   | Code: 503. Errors:
Sep 06 20:20:41 ip-10-11-13-5 vault[1762]:   |
Sep 06 20:20:41 ip-10-11-13-5 vault[1762]:   | * Vault is sealed
Sep 06 20:20:41 ip-10-11-13-5 vault[1762]:   
Sep 06 20:20:41 ip-10-11-13-5 vault[1762]: 2022-09-06T20:20:41.182Z [INFO]  http: TLS handshake error from 10.11.7.91:57983: EOF
Sep 06 20:20:43 ip-10-11-13-5 vault[1762]: 2022-09-06T20:20:43.582Z [INFO]  core: stored unseal keys supported, attempting fetch
Sep 06 20:20:43 ip-10-11-13-5 vault[1762]: 2022-09-06T20:20:43.583Z [WARN]  failed to unseal core: error="stored unseal keys are supported, but none were found"
Sep 06 20:20:45 ip-10-11-13-5 vault[1762]: 2022-09-06T20:20:45.617Z [INFO]  http: TLS handshake error from 10.11.7.91:60729: EOF
Sep 06 20:20:48 ip-10-11-13-5 vault[1762]: 2022-09-06T20:20:48.584Z [INFO]  core: stored unseal keys supported, attempting fetch
Sep 06 20:20:48 ip-10-11-13-5 vault[1762]: 2022-09-06T20:20:48.584Z [WARN]  failed to unseal core: error="stored unseal keys are supported, but none were found"