I am using the helm chart to deploy 3 pods to an on-prem k8s cluster. I want to use raft HA so I am using the following config:
ui = true
listener "tcp" {
address = "[::]:8200"
cluster_address = "[::]:8201"
tls_cert_file = "/vault/userconfig/vault/vault.pem"
tls_key_file = "/vault/userconfig/vault/vault-key.pem"
tls_client_ca_file = "/vault/userconfig/vault/vault-ca.pem"
}
storage "raft" {
path = "/vault/data"
}
service_registration "kubernetes" {}
I execute the following commands on the 1st pod and everything seems to work correctly (I am kubetailing all the logs to monitor eventual errors):
vault operator init
vault operator unseal
vault operator raft list-peers
list-peers shows the pod I am connected to as leaderâŚall good so far.
Node Address State Voter
---- ------- ----- -----
1b50fad0-ccad-8330-672c-62cb8e0d63fe vault-0.vault-internal:8201 leader true
Then I connect to the next pod and enter:
vault operator init
vault operator raft join https://vault-0.vault-internal:8200
vault operator unseal
âŚand raft join outputs:
Key Value
--- -----
Joined true
But when I then connect back to the leader and check the peers again I still see only the leader and not a 2nd node as I would expect:
/ $ vault operator raft list-peers
Node Address State Voter
---- ------- ----- -----
1b50fad0-ccad-8330-672c-62cb8e0d63fe vault-0.vault-internal:8201 leader true
The logs donât show any error and raft join shows âJoined=trueâ, but still it seems like the join has not worked.
Any other method I can troubleshoot this?
Any error in my config somebody sees?
Thanks!
1 Like
nhw76
May 11, 2020, 10:12pm
2
I donât know why itâs reporting success with the âraft joinâ but I suspect part of your problem is the âinitâ on the second pod. You only âinitâ the initial node in the cluster, the subsequent nodes join the cluster and use the same seal as the initial leader.
ttinkr
May 12, 2020, 11:31am
3
Thanks @nhw76 !
Now I only did a init and unseal on the first node and then a raft join on the 2nd node. After this I still donât see the node in raft list-peers, but when I do an unseal on the 2nd node I finally see it listed as a follower. So far so good.
âŚbut
When I check the logs I see the following INFO/WARN/ERROR messages coming from the leader:
[vault-0] 2020-05-12T11:24:32.373Z [INFO] storage.raft: updating configuration: command=AddStaging server-id=801e9c2c-e2a2-f650-1e8d-c3aed9b361f6 server-addr=vault-1.vault-internal:8201 servers="[{Suffrage:Voter ID:f21d4e79-2597-ebe0-23ee-e8d629f5c327 Address:vault-0.vault-internal:8201} {Suffrage:Voter ID:801e9c2c-e2a2-f650-1e8d-c3aed9b361f6 Address:vault-1.vault-internal:8201}]"
[vault-0] 2020-05-12T11:24:32.378Z [INFO] storage.raft: added peer, starting replication: peer=801e9c2c-e2a2-f650-1e8d-c3aed9b361f6
[vault-0] 2020-05-12T11:24:32.380Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter 801e9c2c-e2a2-f650-1e8d-c3aed9b361f6 vault-1.vault-internal:8201}" error="dial tcp 192.168.2.139:8201: connect: connection refused"
[vault-0] 2020-05-12T11:24:32.380Z [INFO] system: follower node answered the raft bootstrap challenge: follower_server_id=801e9c2c-e2a2-f650-1e8d-c3aed9b361f6
[vault-0] 2020-05-12T11:24:32.616Z [WARN] storage.raft: appendEntries rejected, sending older logs: peer="{Voter 801e9c2c-e2a2-f650-1e8d-c3aed9b361f6 vault-1.vault-internal:8201}" next=2
[vault-0] 2020-05-12T11:24:32.623Z [INFO] storage.raft: pipelining replication: peer="{Voter 801e9c2c-e2a2-f650-1e8d-c3aed9b361f6 vault-1.vault-internal:8201}"
So on the CLI it seems that the cluster is working, but those events still make me nervous. Any ideas?
Thank you!
1 Like
nhw76
May 12, 2020, 11:37am
4
In my experience, thereâs a bit of noise immediately after the cluster join while the new follower proves it has unsealed so log replication can start but it settles down quickly.
I think that looks OK.
2 Likes
ttinkr
May 12, 2020, 11:38am
5
Great! Thanks so much for the help! Appreciate it!
good shout on âsettling downâ
totally worked, had to give it a minute and all logs gone quiet. Added KV on leader node, logged in to follower nodes, KV value was replicated almost immediately on others.
Config used:
/etc/vault.d/vault.hcl
storage âraftâ {
path = â/data/vault.d/raft/â
node_id = ânode1.domain.com â
}
listener âtcpâ {
address = â0.0.0.0:8200â
cluster_address = â0.0.0.0:8201â
tls_disable = false
tls_cert_file = â/etc/vault.d/certs/vaultepcom.pemâ
tls_key_file = â/etc/vault.d/certs/vaultepcom.keyâ
}
api_addr = âhttps://node1.domain.com:8200 â
cluster_addr = âhttps://node1.domain.com:8201 â
ui = true
/etc/systemd/system/vault.service
[Unit]
Description=a tool for managing secrets
Documentation=https://vaultproject.io/docs/
After=network.target
ConditionFileNotEmpty=/etc/vault.d/vault.hcl
[Service]
User=vault
Group=vault
ExecStart=/usr/local/sbin/vault server -config=/etc/vault.d/vault.hcl
ExecReload=/usr/local/bin/kill --signal HUP $MAINPID
CapabilityBoundingSet=CAP_SYSLOG CAP_IPC_LOCK
Capabilities=CAP_IPC_LOCK+ep
SecureBits=keep-caps
NoNewPrivileges=yes
KillSignal=SIGINT
[Install]
WantedBy=multi-user.target