Nomad 1.4.3 client health checks on server process

Hi!

Running a 3 server Nomad cluster and I’ve just upgraded to 1.4.3 from 1.3.1

I’ve noticed immediately a lot of failed client health checks in the Nomad logs:

http: request failed: method=GET path=/v1/agent/health?type=client error="{\"client\":{\"ok\":false,\"message\":\"client not enabled\"}}" code=500

I’ve also noticed this failed check pop up in the consul client (which is worrisome because I use it to do auto joining):

HTTP GET http://0.0.0.0:4646/v1/agent/health?type=client: 500 Internal Server Error Output: {"client":{"ok":false,"message":"client not enabled"}}

Is this a bug, or is the expectation now that the server configuration contains the client stanza?

Hi @mircea-c,

The log line from the Nomad logs indicates the Nomad server is being queried for it’s client status, and therefore returning an error to the caller indicating it is not running in client mode. The Consul logs seem to indicate it is the caller.

now that the server configuration contains the client stanza

Could you add a little more context to this please?

Thanks,
jrasell and the Nomad team

Hi @jrasell,

Thanks for taking the time to respond.

Could you add a little more context to this please?

There’s a client stanza in the nomad agent configuration file. The recommended best practice is to not have both client and server active on the same agent.

Given that Consul seems to be querying the client status, I was wondering if this recommendation has changed.

These queries are really flooding both Consul and Nomad logs. I’d really like for this to stop. Is there a way to configure consul to not query for the client state?

Hi @mircea-c,

There’s a client stanza in the nomad agent configuration file

Unless the Nomad agent has client.enabled = true the agent will not instantiate the client subsystem and therefore not register itself within Consul as a client agent.

I wonder if there is a rouge process and orphaned service registration from a previous Nomad configuration which has persisted within Consul?

Thanks,
jrasell and the Nomad team

I checked the running processes and there was only one nomad running.

To be sure, I rebooted the machine. Once it was back up and running, same issue.

How would I get rid of an orphaned service registration in Consul?