Vault Cluster status changes

rok · April 15, 2024, 6:25am

Hi all,

I have a Vault Cluster at AWS (KMS auto-unseal) with 3 nodes and MySQL storage in the backend. Ever since I created Zabbix monitoring to monitor the Active node, but I can see that the status changes all the time:

[WARN]  core: leadership lost, stopping active operation
[INFO]  core: pre-seal teardown starting
[INFO]  rollback: stopping rollback manager
[INFO]  core: pre-seal teardown complete

I have a pretty straight-forward config:

disable_cache           = true
disable_mlock           = true
ui                      = true

# MySQL backend config
storage "mysql" {

  # MYSQL Connection parameters
  address = "MY DB ADDRESS"
  username = "username"
  password = "pass"
  database = "db"

  max_idle_connections = "0"

  max_connection_lifetime = "0"

  ha_enabled = "true"

}

# Vault server listen configuration
listener "tcp" {
  address       = "0.0.0.0:8200"
  cluster_addr  = "0.0.0.0:8201"
  tls_cert_file        = "secret"
  tls_key_file         = "secret"
  tls_disable   = false
}

# the address to advertise for HA purpose
api_addr="https://address:8200"
cluster_addr="https://address:8201"

cluster_name="clusterNAME"

seal "awskms" {
    kms_key_id        = "MYID",
    region = "MY REGION"
}

The whole Vault is behind AWS LB NLB. Am I missing something, should node statuses be regularly changing?

jlj7 · April 15, 2024, 9:50am

Hi rok, I think this, from the docs, might be relevant:

If you would like to not have frequent changes in your elected leader you can increase interactive_timeout and wait_timeout MySQL config to much higher than default which is set at 8 hours.

rok · April 15, 2024, 3:29pm

Thank you for your prompt response. I appreciate your response regarding the MYSQL connection configuration. Adjusting these settings could indeed save valuable time, but I’m cautious about setting them too high.

With that in mind, I have a follow-up question: will the Vault API_ADDR be inaccessible during these leadership changes? If so, this could potentially impact our production environment, and I want to ensure we’re prepared for any potential downtime.

jlj7 · April 15, 2024, 4:27pm

Sensible concerns! Sadly, yes, the cluster will be unavailable during elections. It’s a community supported backend, and so I guess comes with a bit of YMMV caveat.

Also, keep in mind that HashiCorp explicitly recommends using the Integrated Storage backend with production clusters. Again, probably not the answer you were hoping for. Good luck!

rok · April 15, 2024, 5:17pm

I did a stress test for 4k requests from Linux, using Vault API_ADDR at aws - Network load balancer seems fine.

I also tested raft before, but for this scenario, I think MySQL storage will be fine.

Any other tips on the go?

Thanks!

Topic		Replies	Views
Vault Cluster Leader behavior during failover Vault	2	1019	April 5, 2022
Core: leadership lost, stopping active operation Vault vault , vault-release-oss-ent	4	1851	July 23, 2021
Configure Vault leadership change schedule Vault	5	628	September 16, 2022
Vault HA cluster constantly electing a new leader Vault vault	6	2413	August 31, 2020
Vault standby doesn't becomes active immediately when vault-active is down Vault vault	13	1641	November 30, 2022

Vault Cluster status changes

Related topics