Vault Cluster status changes

Hi all,

I have a Vault Cluster at AWS (KMS auto-unseal) with 3 nodes and MySQL storage in the backend. Ever since I created Zabbix monitoring to monitor the Active node, but I can see that the status changes all the time:

[WARN]  core: leadership lost, stopping active operation
[INFO]  core: pre-seal teardown starting
[INFO]  rollback: stopping rollback manager
[INFO]  core: pre-seal teardown complete

I have a pretty straight-forward config:

disable_cache           = true
disable_mlock           = true
ui                      = true

# MySQL backend config
storage "mysql" {

  # MYSQL Connection parameters
  address = "MY DB ADDRESS"
  username = "username"
  password = "pass"
  database = "db"

  max_idle_connections = "0"

  max_connection_lifetime = "0"

  ha_enabled = "true"

}

# Vault server listen configuration
listener "tcp" {
  address       = "0.0.0.0:8200"
  cluster_addr  = "0.0.0.0:8201"
  tls_cert_file        = "secret"
  tls_key_file         = "secret"
  tls_disable   = false
}

# the address to advertise for HA purpose
api_addr="https://address:8200"
cluster_addr="https://address:8201"

cluster_name="clusterNAME"

seal "awskms" {
    kms_key_id        = "MYID",
    region = "MY REGION"
}

The whole Vault is behind AWS LB NLB. Am I missing something, should node statuses be regularly changing?

Hi rok, I think this, from the docs, might be relevant:

If you would like to not have frequent changes in your elected leader you can increase interactive_timeout and wait_timeout MySQL config to much higher than default which is set at 8 hours.

1 Like

Thank you for your prompt response. I appreciate your response regarding the MYSQL connection configuration. Adjusting these settings could indeed save valuable time, but I’m cautious about setting them too high.

With that in mind, I have a follow-up question: will the Vault API_ADDR be inaccessible during these leadership changes? If so, this could potentially impact our production environment, and I want to ensure we’re prepared for any potential downtime.

Sensible concerns! Sadly, yes, the cluster will be unavailable during elections. It’s a community supported backend, and so I guess comes with a bit of YMMV caveat. :confused:

Also, keep in mind that HashiCorp explicitly recommends using the Integrated Storage backend with production clusters. Again, probably not the answer you were hoping for. :confused: Good luck!

I did a stress test for 4k requests from Linux, using Vault API_ADDR at aws - Network load balancer seems fine.

I also tested raft before, but for this scenario, I think MySQL storage will be fine.

Any other tips on the go? :smiley:

Thanks!