Nomad not deregistering services from consul after they moved node

Hi,

We are using Consul to register Nomad services via service and check stanza.

After a service fails and moves nodes, the old (and failed) check ID is still showing in Consul.
I’m not sure this can help but I read about the parameter DeregisterCriticalServiceAfter in the Consul Check documentation

I also tried my luck adding it to the check stanza, but unfortmnaly Consul supports it but not Nomad:

    service {
      name = "test"
      port = 80

      check {
        name     = "alive"
        port     = "http"
        type     = "http"
        path     = "/"
        interval = "10s"
        timeout  = "2s"
        deregister_critical_service_after = "1m"
      }
    }

Thanks!

Hey @Dgotlieb, sorry for the late response on this. This fell through the cracks and just now catching it.

This probably wouldn’t be super difficult to add technically, but I think it leads the user down a bad path. If the Consul service gets deregistered after failing, the Nomad jobspec is no longer representative of the reality in Consul, and becomes less “declarative”.

I think you’ve correctly identified an underlying bug which should probably be reported on it’s own - That Nomad is not properly cleaning up old service health checks when a node moves. If this were fixed, am I right in thinking that you wouldn’t need the DeregisterCriticalServiceAfter support at all? Do you have any more info on how often this is happening and/or repro steps? If so, if you made a GH issue, that would be great - or I can open one for you with any info you provide.

I’m thinking that ideally the underlying issue in Nomad is fixed and then the jobspec can always represent reality.

Thanks!