Nomad 1.4 Autopilot with only single server

I’m operating Nomad 1.4.1 with a single server on a VM and a single client on another VM. While I know a quorum operation mode is desired for the server, I don’t have any high availability needs due to just hosting small personal projects. Nomad + Traefik is a great solution for me to have a low resource deployment infrastructure.

Looking the journald logs, I see this repeatedly for the nomad server:

 2022-10-14T01:25:42.904Z [ERROR] nomad: failed to reconcile member: member="{hub-20220830.global 192.0.2.1 4648 map[bootstrap:1 build:1.4.1 dc:linode expect:1 id:a136885a-848f-65a4-e9cf-af3add9ee834 port:4647 raft_vsn:3 region:global role:nomad rpc_addr:192.0.2.1 vsn:1] alive 1 5 2 2 5 4}" error="error removing server with duplicate ID \"a136885a-848f-65a4-e9cf-af3add9ee834\": need at least one voter in configuration: {[]}"
Oct 14 01:25:42 hub-20220830 nomad[2609001]:     2022-10-14T01:25:42.905Z [ERROR] nomad: failed to reconcile: error="error removing server with duplicate ID \"a136885a-848f-65a4-e9cf-af3add9ee834\": need at least one voter in configuration: {[]}"
Oct 14 01:25:43 hub-20220830 nomad[2609001]:     2022-10-14T01:25:43.136Z [ERROR] nomad.autopilot: Failed to reconcile current state with the desired state
Oct 14 01:25:53 hub-20220830 nomad[2609001]:     2022-10-14T01:25:53.136Z [ERROR] nomad.autopilot: Failed to reconcile current state with the desired state
Oct 14 01:26:03 hub-20220830 nomad[2609001]:     2022-10-14T01:26:03.136Z [ERROR] nomad.autopilot: Failed to reconcile current state with the desired state
Oct 14 01:26:13 hub-20220830 nomad[2609001]:     2022-10-14T01:26:13.136Z [ERROR] nomad.autopilot: Failed to reconcile current state with the desired state
Oct 14 01:26:23 hub-20220830 nomad[2609001]:     2022-10-14T01:26:23.135Z [ERROR] nomad.autopilot: Failed to reconcile current state with the desired state

I’m not seeing any abnormal behavior otherwise, and the server and client do their job allocating docker containers for jobs.

Is there any issue with these error messages? Is this just a side effect of running nomad server in a single node mode?

1 Like

Hi @larrymyers,

This log message is coming from the raft-autopilot library as it attempts to reconcile its state. Unfortunately the library is not logging the error context, and therefore it’s a little tricky to understand exactly what is happening. Is this a fresh server or is there any other information which might help figure this out?

I have raised this PR to correctly log the error context. If/when this is approved and merged, I will look into upgrading the dependency within Nomad.

Thanks,
jrasell and the Nomad team

@jrasell it’s not a fresh server, I believe I did the initial install with Nomad 1.2. So the data directory has been in existence for the 1.3 and 1.4 upgrades. Once your PR makes it into the next release I can upgrade and see if the error messages give a better understanding of what is going on.

1 Like

This log message is coming from the raft-autopilot library as it attempts to reconcile its state.

Hello, I think I am running into the same issue on a small Vagrant machine that is a single-node Nomad 1.4.3 + Consul 1.12.0 cluster. The log messages I am seeing look like this:

Dec 10 20:11:52 localdev hab[655]: nomad.default(O):     2022-12-10T20:11:52.447-0800 [ERROR] nomad: failed to reconcile member: member="{localdev.localdev 172.16.177.181 4648 map[bootstrap:1 build:1.4.3 dc:localdev expect:1 id:27186dce-aed3-d333-9f5b-53ac55851c17 port:4647 raft_vsn:3 region:localdev revision:f464aca721d222ae9c1f3df643b3c3aaa20e2da7 role:nomad rpc_addr:172.16.177.181 vsn:1] alive 1 5 2 2 5 4}" error="error removing server with duplicate ID \"27186dce-aed3-d333-9f5b-53ac55851c17\": need at least one voter in configuration: {[]}"
Dec 10 20:11:52 localdev hab[655]: nomad.default(O):     2022-12-10T20:11:52.447-0800 [ERROR] nomad: failed to reconcile: error="error removing server with duplicate ID \"27186dce-aed3-d333-9f5b-53ac55851c17\": need at least one voter in configuration: {[]}"
Dec 10 20:11:52 localdev hab[655]: nomad.default(O):     2022-12-10T20:11:52.474-0800 [ERROR] nomad.autopilot: Failed to reconcile current state with the desired state

The system also seems to briefly be spiking CPU during that time, but it’s difficult to tell if that’s Nomad or something else.

I am happy to help provide other logs as needed.