Hello:
I did have a working Nomad/Consul test cluster where both applications were installed on Nomad client nodes.
I’m now experimenting with Consul as an exec job, hoping it will be deployed via a Nomad job spec instead of installing it into an OS.
Given three Nomad client nodes where each node network stack is like this:
$ ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
enp1s0 UP
enp2s0 DOWN
vlan.254@enp1s0 UP
vlan.108@enp1s0 UP 192.168.108.xx/25
vlan.100@enp1s0 UP 192.168.100.xx/25
And this Nomad jobspec:
job "consul" {
datacenters = ["homelab"]
type = "system"
group "consul" {
task "servers" {
driver = "exec"
config {
command = "consul"
args = [
"agent",
"-datacenter", "homelab",
"-bind", "{{ GetInterfaceIP \"vlan.100\" }}",
"--bootstrap-expect", "3",
"-client", "0.0.0.0",
"-data-dir", "/opt/consul",
"-encrypt", "my-redacted-consul-key",
"-retry-join", "192.168.100.13",
"-retry-join", "192.168.100.14",
"-retry-join", "192.168.100.15",
"-server",
"-ui",
]
}
artifact {
source = "https://releases.hashicorp.com/consul/1.19.1/consul_1.19.1_linux_amd64.zip"
}
}
}
}
The job runs successfully in Nomad, but Consul is not working. I am unable to get to the UI, and I get this in the Nomad job logs:
2024-07-17T22:31:47.439-0400 [WARN] agent.server.raft: no known peers, aborting election
2024-07-17T22:31:48.071-0400 [WARN] agent.leaf-certs: handling error in Manager.Notify: error="No cluster leader" index=1
2024-07-17T22:31:48.071-0400 [ERROR] agent.server.cert-manager: failed to handle cache update event: error="leaf cert watch returned an error: No cluster leader"
2024-07-17T22:31:50.765-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2024-07-17T22:31:58.246-0400 [WARN] agent.leaf-certs: handling error in Manager.Notify: error="No cluster leader" index=1
2024-07-17T22:32:02.598-0400 [WARN] agent: Syncing node info failed.: error="No cluster leader"
2024-07-17T22:32:02.598-0400 [ERROR] agent: failed to sync changes: error="No cluster leader"
2024-07-17T22:32:05.629-0400 [WARN] agent.leaf-certs: handling error in Manager.Notify: error="No cluster leader" index=1
2024-07-17T22:32:13.516-0400 [WARN] agent: Syncing node info failed.: error="No cluster leader"
2024-07-17T22:32:13.516-0400 [ERROR] agent: failed to sync changes: error="No cluster leader"
2024-07-17T22:32:16.403-0400 [ERROR] agent: Coordinate update error: error="No cluster leader"
2024-07-17T22:32:17.194-0400 [WARN] agent.leaf-certs: handling error in Manager.Notify: error="No cluster leader" index=1
2024-07-17T22:32:27.743-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2024-07-17T22:32:30.166-0400 [WARN] agent.leaf-certs: handling error in Manager.Notify: error="No cluster leader" index=1
The arguments defined in the job for consul are similar to those I had in the configuration file when consul was installed on the Nomad client.
I am unsure why the consul cluster does not form.
Is this way of deploying consul valid, and if so where am I going wrong?
Thanks for your input!