Hi,
Dirty solution, inspired by this issue on Github: option to restore eligibility after drain_on_shutdown · Issue #17093 · hashicorp/nomad · GitHub
In /etc/nomad/nomad.hcl
I add:
leave_on_terminate = true
leave_on_interrupt = true
[...]
client {
enabled = true
servers = ["127.0.0.1:4647"]
server_join {
retry_join = [ "127.0.0.1" ]
retry_max = 3
retry_interval = "15s"
}
drain_on_shutdown {
deadline = "1m"
force = true
ignore_system_jobs = true
}
[...]
}
leave_on_terminate
& leave_on_interrupt
set to true
to stop Nomad Jobs for any signal like reboot
host or restart
Nomad .
But with this. At reboot, Node is not eligible. So I create this systemd service:
[Unit]
Description=Nomad auto Eligibility node service
After=nomad.service
[Service]
Type=oneshot
Restart=on-failure
ExecStartPre=/bin/bash -c "/usr/bin/sleep 60"
ExecStart=/usr/local/bin/ansible-playbook -i localhost, nomad_autoeligibility.yml
User=root
Group=root
[Install]
WantedBy=multi-user.target
and my playbook has as task somthing like:
- name: "Nomad | Set Eligibility of node"
ansible.builtin.shell: nomad node eligibility -enable -self
But you can switch to simple bash script.
And as last step, I add to nomad.service
in [Unit]
Wants=network-online.target nomad-autoeligibility.service
To finis, systemctl daemon-reload
and voilĂ !
When I reboot host, Nomad stop all job, drain node and after reboot. After 1 minute after Nomad start, node is eligible again and jobs restart.
If someone have better idea, I will happy to try it.
Thanks