Using Load Balancer for Vault

We are running vault cluster(3 nodes in Oracle Cloud Infrastructure) behind Load balancer (oracle load balancer).

We are seeing timeouts happening when trying to connect to UI and today we did some load testing to see what’s happening and observed few things.

  1. Load Balancer check looks like failed and failed over to a different node(which was not leader)

  2. After 7 mins, LB failed back to original node(which was leader) and during this period we saw multiple connection timeouts.

Can someone help with this issue? if there is LB tuning that needs to be done or is it because LB has some limitations, as I see from vault documentation.

We are using Oracle Object storage as our backend.

$cat /etc/vault.d/vault.hcl
cluster_name = "vault"
default_lease_ttl = "5m"

telemetry {
  prometheus_retention_time = "24h"
  disable_hostname = true
}

listener "tcp" {
  address       = "0.0.0.0:8200"
  #tls_cert_file = "/etc/vault.d/vault.crt"
  tls_disable   = "true"
  #tls_key_file  = "/etc/vault.d/key.pem"
  telemetry {
    unauthenticated_metrics_access = true
  }
}
log_level = "INFO"
max_lease_ttl = "30m"

seal "ocikms" {
  auth_type_api_key   = "false"
  crypto_endpoint     = "<endpoint>"
  key_id              = "<OCID>"
  management_endpoint = "<mgmt endpoint>"
}

storage "oci" {
  auth_type_api_key = "false"
  bucket_name       = "vault"
  ha_enabled        = "true"
  lock_bucket_name  = "vault_lock"
  namespace_name    = "<namespace>"
  redirect_addr     = "https://<vault_lb_url>:8200"
  api_addr          = "https://<vault_lb_url>:8200"
  cluster_addr      = "http://<vault_lb_IP>:8201"
}

ui = "true"

LB health check is hitting this with http protocol and port 8200 for status code 200

/v1/sys/health

Any help is highly appreciated on how to fix this?

I believe your api_addr and cluster_addr parameters are in the wrong location within your config file. They should be outside of the storage block, similar to how you have the ui, cluster_name, and default_lease_ttl parameters.

api_addr and cluster_addr should point to a specific Vault instance, typically the node’s own IP/DNS name. I believe, based on what I’m reading in the docs, you can set the value to http://{{ GetPrivateIP }}:8200 and http://{{ GetPrivateIP }}:8201 respectively.

As I understand it, if you’re disabling TLS on the listener, the api_addr should reflect that as well. Although, it’s best practice to terminate TLS at the Vault node and just have the load balancer pass through the traffic.

If you’re looking to see the true source IP of your clients you’ll need to configure either x_forwarded_for_authorized_addrs or proxy_protocol_authorized_addrs, and potentially other related parameters, in your listener block depending on what your load balancer supports.