Core.perf-standby INFO messages creating instability

Hi all,

I’ve been playing around with the Enterprise configuration option, retry_join_as_non_voter, and I’m wondering whether it’s caused the cluster’s logs to be flooded with the following messages, ultimately crashing the active node:

2024-11-07T17:04:10.551005+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:10.522Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=717410d2-a056-dc42-c435-20282775185e secondary ID=ba70c97f-6d71-d541-9456-70a1c14fb7b4
2024-11-07T17:04:10.766490+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:10.760Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=39d8a759-cce7-8de9-42e4-804f33dbc456 secondary ID=8e21e94e-6f93-2e70-33df-17039a5e8d2e
2024-11-07T17:04:11.303256+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:11.293Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=3ba0387f-d2a9-6d0b-092d-a47089b21205 secondary ID=7db1b60d-1b5a-7682-fde4-6383fa8c8a12
2024-11-07T17:04:11.432696+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:11.425Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=f75dda27-2329-5e2a-e0fa-e2a0807a87b5 secondary ID=5511565f-fb4e-5e02-f7da-b38b510f612c
2024-11-07T17:04:11.791355+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:11.769Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=67382254-3554-0340-f3fa-1d37b59b3b0e secondary ID=31bdd5bc-3445-7e0c-c893-81ec8fbe957a
2024-11-07T17:04:12.051365+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:12.044Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=79b3b274-aeb1-2a34-d79a-3508a096489c secondary ID=f7d0f0d3-18bb-7306-985b-f1523b0290fc
2024-11-07T17:04:12.314396+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:12.304Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=727b255c-e9ff-af8b-80dc-3e054763e37c secondary ID=d699820b-0e29-b55e-342c-02ac8eecc8f5
2024-11-07T17:04:12.690426+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:12.683Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=734dae88-dbc8-4dcc-eaff-7f7de70bf1af secondary ID=aa45daa9-4d08-62dd-ee43-33f1978dc714
2024-11-07T17:04:13.066362+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:13.059Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=1ecdf165-f11d-1afb-a7d8-533193e1159a secondary ID=cca54455-f5ac-b141-a5b1-937b099d5384
2024-11-07T17:04:13.551326+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:13.540Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=57d690ca-5638-8135-38df-317c72404c71 secondary ID=d42f67e5-4ff2-d284-6d22-8c2592b8024b
2024-11-07T17:04:13.873553+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:13.867Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=6e59503e-8f83-874e-6b25-d9eb89b83693 secondary ID=691d1ef8-c044-987b-2df5-dec62163cf42
2024-11-07T17:04:14.406136+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:14.399Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=838df42c-2777-d2c8-c01e-ae40a1e69eb3 secondary ID=cf5c444b-6778-bf57-f889-409271a6fa06
2024-11-07T17:04:14.660425+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:14.650Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=6f57cd1b-33d3-f24d-4868-8dc11a347c2c secondary ID=499aa0a1-dc25-f6b2-c3d7-1ef070f9d686
2024-11-07T17:04:15.491755+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:15.484Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=26f22d27-aaaa-e24d-8110-af2dd0aa22ca secondary ID=a4795140-4e00-717f-6d8e-ed44793b00a6
2024-11-07T17:04:15.888549+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:15.878Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=87d3d52e-7718-9782-a2f0-fb4e5ffebe13 secondary ID=7467ab2b-fa3c-5e73-742c-08e362a35dac
2024-11-07T17:04:16.816675+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:16.808Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=1197c284-50c9-1915-3258-50153e6ceb10 secondary ID=eef67d2c-5395-b9f9-c0fc-9167de06ab1d
2024-11-07T17:04:17.450301+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:17.439Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=ed3595de-98f1-0637-811a-35c7adc2e611 secondary ID=e72c9aa9-49bf-7af8-bef4-1c60930cb9dc
2024-11-07T17:04:18.274435+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:18.266Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=a3bd1d7d-94ad-20f0-7c5f-a57cd079755b secondary ID=c27dc145-0e95-0fe2-7f70-7013f87fd33f
2024-11-07T17:04:18.934572+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:18.898Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=00b24ed7-146e-c4ee-48a9-569341f90c4d secondary ID=f96794f3-3e34-3bd5-aa43-691d1bd701a1
2024-11-07T17:04:19.984935+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:19.977Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=511aaeb3-3d42-b0e9-0478-3bcb16ba8721 secondary ID=3d5d168a-c75d-c7d1-8419-c8fa491eb82b
2024-11-07T17:04:21.486855+00:00 vault-node0 vault[1452397]: 2024-11-07T17:04:21.470Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=7fc0db7c-ef7e-26a4-3b9b-94b9f1e8b409 secondary ID=d475fbbc-c92b-e5ce-7d57-4ff9b2f9426b
2024-11-07T17:09:07.458483+00:00 vault-node0 vault[1452397]: 2024-11-07T17:08:48.600Z [INFO]  core.perf-standby: serving replication for secondary: alpn=perf_standby_v1 serverName=7b1c9ba1-d3d3-0a58-27d2-c99190787f5b secondary ID=dc4d74e3-8f9f-4b0b-0d3e-c4c986fc5b28
2024-11-07T17:10:05.260791+00:00 vault-node0 vault[1452397]: 2024-11-07T17:09:40.630Z [INFO]  core.perf-standby: timeout waiting for SETTINGS frames from 192.168.64.6:56594

Spec:

  • Four Vault v1.18.1+ent nodes, each of Ubuntu 24.04.1 (noble) LTS (using Multipass)
  • One (vault-node3) has retry_join_as_non_voter = true, the other three have pretty much identical configurations, as follows:
ui = true

#mlock = true
cluster_addr  = "https://192.168.64.8:8201"
api_addr      = "https://192.168.64.8:8200"
disable_mlock = true

storage "raft" {
  path    = "/opt/vault/data"
  node_id = "<current node name>"

  retry_join {
    leader_tls_servername = "node0..."
    leader_ca_cert_file = "/opt/vault/tls/<CA>.crt"
    leader_client_cert_file = "/opt/vault/tls/<current node>.crt"
    leader_client_key_file = "/opt/vault/tls/<current node>.key"
    leader_api_addr = "https://192.168.64.3:8200"
  }
  retry_join {
    leader_tls_servername = "node1..."
    leader_ca_cert_file = "/opt/vault/tls/<CA>.crt"
    leader_client_cert_file = "/opt/vault/tls/<current node>.crt"
    leader_client_key_file = "/opt/vault/tls/<current node>.key"
    leader_api_addr = "https://192.168.64.6:8200"
  }
  retry_join {
    leader_tls_servername = "node2..."
    leader_ca_cert_file = "/opt/vault/tls/<CA>.crt"
    leader_client_cert_file = "/opt/vault/tls/<current node>.crt"
    leader_client_key_file = "/opt/vault/tls/<current node>.key"
    leader_api_addr = "https://192.168.64.5:8200"
  }
}

# HTTPS listener
listener "tcp" {
  address       = "0.0.0.0:8200"
  tls_cert_file = "/opt/vault/tls/<current node>.crt"
  tls_key_file  = "/opt/vault/tls/<current node>.key"
}
...

The supposed non-voter (vault-node3) always joined as a voter. I hadn’t configured any replication, so I was a bit mystified by the log entries above. I suspect I’ve invoked the Performance Standby Node feature incorrectly – I know four nodes isn’t a recommended number for maintaining quorum / running elections, for example – but even commenting out the ...non-voter line in its config and having that node rejoin the cluster didn’t affect the flood of messages as above.

Anyone have a similar experience or insights?