We use Consul with Nomad in our app cluster.
When I stop Consul client (not server) and Nomad on one of the machines Nomad moves the application to other node which is perfectly fine, but for some reason I can still see service registered in Consul.
I can see that Consul node health check is failing: Agent not live or unreachable
, but node is not de-registered. Any idea why?
We have Traefik with Consul Catalog integration and Traefik do the load balancing to the application instance that is already down because Consul Catalog has two instances of the service one old one that is already down and second one the new one.
So the question is why Nomad or Consul don’t unregister the service and it is still available in Consul Catalog. It’s the real world scenario that machine goes down without any notice and in that case nodes are not able to gracefully leave the cluster.
You can deregister it via api calls -
$ curl --request PUT http://127.0.0.1:8500/v1/agent/check/deregister/my-check-id
For more ref: https://www.consul.io/api/agent/check.html#deregister-check
Hope it helps
What is the point of doing this manually since we use Consul for service discovery so it should be able to remove service automatically based on health check status?
Hi @maxio89,
Nodes are not immediately removed from Consul when they become unhealthy. They are periodically cleaned up through a process called reaping (see: Consul Agent: Lifecycle).
If a node becomes unhealthy, the services registered on that node will also be marked as unhealthy (see Services: Checks), and will be omitted from the results of a DNS query or from the /health/service/:service
HTTP API endpoint.
I quickly looked at the behavior of the Consul Catalog integration in Traefik 2.x. It seems that Traefik queries for all services and returns their associated health status. I assume it filters out unhealthy services from those results, but I can’t tell for sure.
What version of Traefik are you using in your environment? Does Traefik show these services as being healthy or unhealthy when it imports them from Consul?
2.1.1. It shows them as healthy and that’s the problem, because it redirects traffic also to unhealthy instances. I found workaround by configuring additional healthchecks in Traefik, but it shouldn’t work that way. I’m going to check the results of Consul Catalog API to see whether problem lies on Traefik or Consul side.