Services from failed allocations isn't unregistered

I’m playing with a simple job to see how nomad handles failures (with https://github.com/ThomasObenaus/dummy-services/tree/master/fail_service). Cluster contains 3 servers (all running consul/nomad server and client), but the job is constrained to run on one specific server, 1 instance. Nomad handles failures as expected via restarting the job. But what I noticed that services from consul isn’t deregistered and grow over time, see screenshot:

Job is fairly simple (see below). My assumption that services should be removed when allocation stopped. Am I wrong? Is it a bug with nomad?


job "fail" {
  datacenters = ["dc"]

  constraint {
    attribute = "${attr.unique.consul.name}"
    operator = "regexp"
    value = "^(hjk)$"
  }

  update {
    healthy_deadline = "30s"
    progress_deadline = "40s"
    min_healthy_time = "0s"
  }


  group "fail" {
    count = "1"

    network {
      port "http" {
        to = 8080
      }
    }
    service {
      port = "http"
      check {
        port     = "http"
        type     = "http"
        path     = "/health"
        method   = "GET"
        interval = "10s"
        timeout  = "2s"
        check_restart {
          limit = 1
        }
      }
    }

    task "fail" {
      driver = "docker"
      config {
        image = "thobe/fail_service:latest"
        ports = ["http"]
      }


      env {
        HEALTHY_FOR   = 60
        UNHEALTHY_FOR = -1
      }
    }
  }
}

Hi @valodzka,

Thanks for using Nomad!

So I’ve tried for a bit here on my end, but I’m unable to reproduce your issue. What versions of Nomad & Consul are you using? Do you mind sharing your server config for both Nomad & Consul? Please make sure to remove any secrets if you are using any.

Thanks,

Derek

nomad 1.1.3, consul v1.10.1

Consul config:

datacenter = "..."
node_name = "hjk"
encrypt = "..."
advertise_addr = "..."
disable_remote_exec = false
disable_update_check = true
ui_config {
  enabled = true
}
client_addr = "0.0.0.0"

server = true
bootstrap_expect = 3

retry_join = ["...", "..."]

dns_config {
  enable_truncate = true
}

telemetry {
  dogstatsd_addr = "localhost:8125"
  disable_hostname = true
  disable_compat_1.9 = true
}

I provided more info here Services from failed services isn't deregistred/blinking · Issue #11057 · hashicorp/nomad · GitHub

Thanks for opening that issue! I see that you are already involved in troubleshooting over there, so I’ll move my input to that forum.