Nomad constantly deregister Consul services

I have set up a Nomad cluster along with a Consul instance so that the jobs can register services to connect to.

However, the services keep getting synced and deregistered. Here is what I have from the Consul logs:

    2021-01-26T14:49:59.174Z [INFO]  agent: Synced check: check=_nomad-check-dc23801467b8a65a4fd82311c2606724a180065c
    2021-01-26T14:50:00.072Z [INFO]  agent: Synced check: check=_nomad-check-1783c554d9ee0a25d52532f4178c392e931e4bb1
    2021-01-26T14:50:04.511Z [INFO]  agent: Synced service: service=_nomad-task-e8d2b77b-3bf5-96c1-8323-63b6151e2cf3-lb0-lb0-admin-admin
    2021-01-26T14:50:09.962Z [INFO]  agent: Deregistered service: service=_nomad-task-e8d2b77b-3bf5-96c1-8323-63b6151e2cf3-lb0-lb0-admin-admin
    2021-01-26T14:50:34.554Z [INFO]  agent: Synced service: service=_nomad-task-e8d2b77b-3bf5-96c1-8323-63b6151e2cf3-lb0-lb0-admin-admin
    2021-01-26T14:50:39.984Z [INFO]  agent: Deregistered service: service=_nomad-task-e8d2b77b-3bf5-96c1-8323-63b6151e2cf3-lb0-lb0-admin-admin
    2021-01-26T14:51:04.589Z [INFO]  agent: Synced service: service=_nomad-task-e8d2b77b-3bf5-96c1-8323-63b6151e2cf3-lb0-lb0-admin-admin
    2021-01-26T14:51:10.009Z [INFO]  agent: Deregistered service: service=_nomad-task-e8d2b77b-3bf5-96c1-8323-63b6151e2cf3-lb0-lb0-admin-admin

There is nothing in Nomad logs which is showing why this happens.

Any idea what could cause this issue?

Nomad v1.0.2
Consul v1.9.1

Hi @spack971, are you able to confirm whether it is Nomad actually triggering deregistrations? You should be able to see metrics for client.consul.service_deregistrations (and client.consul.service_registerations) from this client accumulating if that is the case. That consistent ~5 second gap before the de-registrations is curious - Nomad’s reconciliation loop for Consul services is every 30 seconds.

1 Like

Here is what I have:

[2021-01-28 11:31:50 +0100 CET][C] 'nomad.client.consul.service_deregistrations.xxx': Count: 1 Sum: 1.000 LastUpdated: 2021-01-28 11:31:56.838423778 +0100 CET m=+69783.568045089
[2021-01-28 11:31:50 +0100 CET][C] 'nomad.client.consul.service_registrations.xxx': Count: 3 Sum: 3.000 LastUpdated: 2021-01-28 11:31:57.003773987 +0100 CET m=+69783.733395294
[2021-01-28 11:32:20 +0100 CET][C] 'nomad.client.consul.service_deregistrations.xxx': Count: 1 Sum: 1.000 LastUpdated: 2021-01-28 11:32:27.048990484 +0100 CET m=+69813.778611788
[2021-01-28 11:32:20 +0100 CET][C] 'nomad.client.consul.service_registrations.xxx': Count: 3 Sum: 3.000 LastUpdated: 2021-01-28 11:32:27.235769776 +0100 CET m=+69813.965391079

Hi,

I got two nodes in my test cluster:

  • Node A, client and server.
  • Node B, client and server.

Both nodes are started with the same configuration. However when I look at the logs at TRACE level, I have the following:

Node A:

2021-01-28T15:58:55.519+0100 [DEBUG] consul.sync: sync complete: registered_services=3 deregistered_services=1 registered_checks=0 deregistered_checks=0

Node B:

2021-01-28T15:58:59.037+0100 [DEBUG] consul.sync: sync complete: registered_services=1 deregistered_services=3 registered_checks=0 deregistered_checks=0

Indeed, Node A has got 3 jobs running while Node B got 1. It seems both nodes is reverting the changes made by the other one.

Name               Address       Port  Status  Leader  Protocol  Build  Datacenter  Region
NodeA              198.51.100.1  4648  alive   false   2         1.0.2  us1         us
NodeB              198.51.100.2  4648  alive   true    2         1.0.2  us1         us

So did I miss something in my configuration? How to prevent this?

1 Like

This behavior is actually displayed in the documentation. I just overlooked it:

An important requirement is that each Nomad agent talks to a unique Consul agent. Nomad agents should be configured to talk to Consul agents and not Consul servers. If you are observing flapping services, you may have multiple Nomad agents talking to the same Consul agent. As such avoid configuring Nomad to talk to Consul via DNS such as consul.service.consul

7 Likes

hello i need help , whenever i am running job in nomad it is registering in consul but in few seconds it is getting deregister can anyone please provide solution

Thank you! This actually solves the problem. Each Nomad node must have its own Consul agent set in its config file.