Services from failed allocations isn't unregistered

valodzka · August 18, 2021, 2:10pm

I’m playing with a simple job to see how nomad handles failures (with https://github.com/ThomasObenaus/dummy-services/tree/master/fail_service). Cluster contains 3 servers (all running consul/nomad server and client), but the job is constrained to run on one specific server, 1 instance. Nomad handles failures as expected via restarting the job. But what I noticed that services from consul isn’t deregistered and grow over time, see screenshot:

Job is fairly simple (see below). My assumption that services should be removed when allocation stopped. Am I wrong? Is it a bug with nomad?


job "fail" {
  datacenters = ["dc"]

  constraint {
    attribute = "${attr.unique.consul.name}"
    operator = "regexp"
    value = "^(hjk)$"
  }

  update {
    healthy_deadline = "30s"
    progress_deadline = "40s"
    min_healthy_time = "0s"
  }


  group "fail" {
    count = "1"

    network {
      port "http" {
        to = 8080
      }
    }
    service {
      port = "http"
      check {
        port     = "http"
        type     = "http"
        path     = "/health"
        method   = "GET"
        interval = "10s"
        timeout  = "2s"
        check_restart {
          limit = 1
        }
      }
    }

    task "fail" {
      driver = "docker"
      config {
        image = "thobe/fail_service:latest"
        ports = ["http"]
      }


      env {
        HEALTHY_FOR   = 60
        UNHEALTHY_FOR = -1
      }
    }
  }
}

DerekStrickland · August 24, 2021, 3:37pm

Hi @valodzka,

Thanks for using Nomad!

So I’ve tried for a bit here on my end, but I’m unable to reproduce your issue. What versions of Nomad & Consul are you using? Do you mind sharing your server config for both Nomad & Consul? Please make sure to remove any secrets if you are using any.

Thanks,

Derek

valodzka · August 25, 2021, 8:48am

nomad 1.1.3, consul v1.10.1

Consul config:

datacenter = "..."
node_name = "hjk"
encrypt = "..."
advertise_addr = "..."
disable_remote_exec = false
disable_update_check = true
ui_config {
  enabled = true
}
client_addr = "0.0.0.0"

server = true
bootstrap_expect = 3

retry_join = ["...", "..."]

dns_config {
  enable_truncate = true
}

telemetry {
  dogstatsd_addr = "localhost:8125"
  disable_hostname = true
  disable_compat_1.9 = true
}

I provided more info here Services from failed services isn't deregistred/blinking · Issue #11057 · hashicorp/nomad · GitHub

DerekStrickland · August 25, 2021, 8:34pm

Thanks for opening that issue! I see that you are already involved in troubleshooting over there, so I’ll move my input to that forum.

Topic		Replies	Views
Nomad not deregistering services from consul after they moved node Nomad consul	1	1117	March 8, 2022
Nomad register service to consul with the same service id even there is no related allocations Nomad consul-nomad	1	637	January 13, 2021
Consul service deregistration upon job reallocation Nomad	2	631	December 14, 2021
Services not deregister after Nomad stop job Nomad connect	5	1474	March 13, 2023
Warning! This service has been deregistered and no longer exists in the catalog Consul	3	1322	April 30, 2025

Services from failed allocations isn't unregistered

Related topics