Greetings earthlings,
I am struggling to terraform a Nomad cluster in my AWS environment as the Nomad Server HTTP Check in Consul doesn’t seem to pass no matter what I do.
Here are the cornerstones of it all:
- relying on Consul agent for cluster formation and service discovery
- terraformed via official Gruntworks maintained modules (somewhat adapted)
- Amazon Linux 2 based custom Packer-ed AMI
- Nomad 0.12.4
- Consul 1.8.3
As for Nomad Config:
region = "europe"
datacenter = "aws15-${stage}"
data_dir = "/opt/nomad/data/"
disable_update_check = true
# bind_addr = "$PRIVATE_IP"
addresses {
http = "0.0.0.0"
}
advertise {
http = "{{ GetInterfaceIP \"eth0\" }}"
rpc = "{{ GetInterfaceIP \"eth0\" }}"
serf = "{{ GetInterfaceIP \"eth0\" }}"
}
server {
enabled = true
bootstrap_expect = 3
encrypt = "REDACTED"
}
plugin "raw_exec" {
config {
enabled = true
}
}
leave_on_terminate = true
leave_on_interrupt = true
Consul client being:
datacenter = "aws15-mgmt"
ports {
grpc = 8502
}
data_dir = "/opt/consul/data"
disable_update_check = true
leave_on_terminate = true
### Cloud autodiscovery section ###
retry_join = [
"provider=aws tag_key=consul-servers tag_value=consul-mgmt addr_type=private_v4 region=eu-central-1"
]
### Encryption ###
encrypt = "REDACTED"
encrypt_verify_incoming = false
encrypt_verify_outgoing = true
verify_incoming = false
verify_outgoing = true
verify_server_hostname = true
ca_file = "consul-agent-ca.pem"
auto_encrypt {
tls = true
}
### Disable server mode ###
server = false
raft_protocol = 3
### Enable central configuration ###
enable_central_service_config = true
This is how it looks in the Consul Server GUI:
Here’s some logging output from one node:
nomad monitor
2020-09-10T16:09:04.556+0200 [INFO] nomad: successfully contacted Nomad servers: num_servers=2
2020-09-10T16:09:45.936+0200 [ERROR] worker: failed to dequeue evaluation: error="rpc error: eval broker disabled"
2020-09-10T16:09:45.936+0200 [ERROR] worker: failed to dequeue evaluation: error="rpc error: eval broker disabled"
2020-09-10T16:09:47.043+0200 [WARN] nomad.raft: rejecting vote request since we have a leader: from=172.18.131.221:4647 leader=172.18.131.201:4647
2020-09-10T16:09:47.223+0200 [INFO] nomad: serf: EventMemberLeave: mgmt-nomad-t2c.europe 172.18.131.201
2020-09-10T16:09:47.223+0200 [INFO] nomad: removing server: server="mgmt-nomad-t2c.europe (Addr: 172.18.131.201:4647) (DC: aws15-mgmt)"
2020-09-10T16:09:48.417+0200 [WARN] nomad.raft: rejecting vote request since we have a leader: from=172.18.131.221:4647 leader=172.18.131.201:4647
2020-09-10T16:09:48.719+0200 [WARN] nomad.raft: heartbeat timeout reached, starting election: last-leader=172.18.131.201:4647
2020-09-10T16:09:48.719+0200 [INFO] nomad.raft: entering candidate state: node="Node at 172.18.131.96:4647 [Candidate]" term=30
2020-09-10T16:09:48.723+0200 [ERROR] worker: failed to dequeue evaluation: error="rpc error: No cluster leader"
2020-09-10T16:09:48.723+0200 [ERROR] worker: failed to dequeue evaluation: error="rpc error: No cluster leader"
2020-09-10T16:09:48.724+0200 [INFO] nomad.raft: entering follower state: follower="Node at 172.18.131.96:4647 [Follower]" leader=
2020-09-10T16:09:50.757+0200 [ERROR] worker: failed to dequeue evaluation: error="rpc error: eval broker disabled"
2020-09-10T16:09:50.758+0200 [ERROR] worker: failed to dequeue evaluation: error="rpc error: eval broker disabled"
2020-09-10T16:09:51.001+0200 [INFO] nomad: serf: EventMemberJoin: mgmt-nomad-t2c.europe 172.18.131.201
2020-09-10T16:09:51.001+0200 [INFO] nomad: adding server: server="mgmt-nomad-t2c.europe (Addr: 172.18.131.201:4647) (DC: aws15-mgmt)"
2020-09-10T16:09:51.975+0200 [INFO] nomad: serf: EventMemberFailed: mgmt-nomad-7tn.europe 172.18.131.221
2020-09-10T16:09:51.975+0200 [INFO] nomad: removing server: server="mgmt-nomad-7tn.europe (Addr: 172.18.131.221:4647) (DC: aws15-mgmt)"
2020-09-10T16:09:52.667+0200 [WARN] nomad.raft: rejecting vote request since we have a leader: from=172.18.131.201:4647 leader=172.18.131.221:4647
2020-09-10T16:09:53.541+0200 [ERROR] worker: failed to dequeue evaluation: error="rpc error: No cluster leader"
2020-09-10T16:09:53.541+0200 [ERROR] worker: failed to dequeue evaluation: error="rpc error: No cluster leader"
2020-09-10T16:09:53.580+0200 [WARN] nomad.raft: heartbeat timeout reached, starting election: last-leader=172.18.131.221:4647
2020-09-10T16:09:53.580+0200 [INFO] nomad.raft: entering candidate state: node="Node at 172.18.131.96:4647 [Candidate]" term=33
2020-09-10T16:09:53.584+0200 [INFO] nomad.raft: election won: tally=1
2020-09-10T16:09:53.584+0200 [INFO] nomad.raft: entering leader state: leader="Node at 172.18.131.96:4647 [Leader]"
2020-09-10T16:09:53.584+0200 [INFO] nomad: cluster leadership acquired
2020-09-10T16:09:53.587+0200 [INFO] nomad.raft: updating configuration: command=AddStaging server-id=172.18.131.201:4647 server-addr=172.18.131.201:4647 servers="[{Suffrage:Voter ID:172.18.131.96:4647 Address:172.18.131.96:4647} {Suffrage:Voter ID:172.18.131.201:4647 Address:172.18.131.201:4647}]"
2020-09-10T16:09:53.588+0200 [INFO] nomad.raft: added peer, starting replication: peer=172.18.131.201:4647
2020-09-10T16:09:53.592+0200 [WARN] nomad.raft: appendEntries rejected, sending older logs: peer="{Voter 172.18.131.201:4647 172.18.131.201:4647}" next=83
2020-09-10T16:09:53.595+0200 [INFO] nomad.raft: pipelining replication: peer="{Voter 172.18.131.201:4647 172.18.131.201:4647}"
2020-09-10T16:09:55.823+0200 [INFO] nomad: serf: EventMemberJoin: mgmt-nomad-7tn.europe 172.18.131.221
2020-09-10T16:09:55.823+0200 [INFO] nomad: adding server: server="mgmt-nomad-7tn.europe (Addr: 172.18.131.221:4647) (DC: aws15-mgmt)"
2020-09-10T16:09:55.823+0200 [INFO] nomad.raft: updating configuration: command=AddStaging server-id=172.18.131.221:4647 server-addr=172.18.131.221:4647 servers="[{Suffrage:Voter ID:172.18.131.96:4647 Address:172.18.131.96:4647} {Suffrage:Voter ID:172.18.131.201:4647 Address:172.18.131.201:4647} {Suffrage:Voter ID:172.18.131.221:4647 Address:172.18.131.221:4647}]"
2020-09-10T16:09:55.825+0200 [INFO] nomad.raft: added peer, starting replication: peer=172.18.131.221:4647
2020-09-10T16:09:55.825+0200 [ERROR] nomad.raft: failed to appendEntries to: peer="{Voter 172.18.131.221:4647 172.18.131.221:4647}" error=EOF
2020-09-10T16:09:56.296+0200 [WARN] nomad.raft: appendEntries rejected, sending older logs: peer="{Voter 172.18.131.221:4647 172.18.131.221:4647}" next=86
2020-09-10T16:09:56.299+0200 [INFO] nomad.raft: pipelining replication: peer="{Voter 172.18.131.221:4647 172.18.131.221:4647}"
2020-09-10T16:10:01.669+0200 [INFO] nomad: server starting leave
2020-09-10T16:10:01.669+0200 [INFO] nomad.raft: updating configuration: command=RemoveServer server-id=172.18.131.96:4647 server-addr= servers="[{Suffrage:Voter ID:172.18.131.201:4647 Address:172.18.131.201:4647} {Suffrage:Voter ID:172.18.131.221:4647 Address:172.18.131.221:4647}]"
2020-09-10T16:10:01.673+0200 [INFO] nomad.raft: removed ourself, transitioning to follower
2020-09-10T16:10:01.673+0200 [INFO] nomad.raft: entering follower state: follower="Node at 172.18.131.96:4647 [Follower]" leader=
2020-09-10T16:10:01.674+0200 [INFO] nomad.raft: aborting pipeline replication: peer="{Voter 172.18.131.201:4647 172.18.131.201:4647}"
2020-09-10T16:10:01.675+0200 [INFO] nomad: cluster leadership lost
2020-09-10T16:10:01.675+0200 [ERROR] worker: failed to dequeue evaluation: error="eval broker disabled"
2020-09-10T16:10:01.675+0200 [INFO] nomad.raft: aborting pipeline replication: peer="{Voter 172.18.131.221:4647 172.18.131.221:4647}"
2020-09-10T16:10:02.466+0200 [INFO] nomad: serf: EventMemberLeave: mgmt-nomad-7jt.europe 172.18.131.96
2020-09-10T16:10:02.466+0200 [INFO] nomad: removing server: server="mgmt-nomad-7jt.europe (Addr: 172.18.131.96:4647) (DC: aws15-mgmt)"
2020-09-10T16:10:03.261+0200 [WARN] nomad.raft: not part of stable configuration, aborting election
Any help is appreciated!
Cheers
Ralph