I’m trying to monitor a HTTP endpoint of one of several Consul server instances behind a loadbalancer. The loadbalancer runs health checks to determine whether or not to send traffic to it. I haven’t been able to find a health endpoint for Consul itself, only health endpoints for others. In the service manifest I need a health endpoint for the health check to request and get a 200 OK back.
It is not “/health”, “/health/node”, “/health/node/self” or anything that I could find there.
I am currently using just a random-ish API path that will give me a 200, but it’s not really correct. I’m using “/v1/health/state/passing”.
Is there actually a health check endpoint for Consul itself?
Can you give us some more context on what you are attempting to achieve with the load balancer fronting the Consul servers?
In the usual deployment topology, all of the Consul agents running form a gossip cluster, and rely on the client running on the local node to correctly forward traffic to the cluster’s leader. If I could understand your need for a load balancer, then that would help in figuring out a good way forward.
To answer the question about a health check endpoint, perhaps the most useful one would be /v1/status/leader, which at least tells you that a Leader has been elected amongst the Consul servers and is working. You would need to check that a valid / non-empty response was returned though.
I’m merely trying to access the web UI to the consul servers, while running them in AWS ECS using Fargate instances. The target group for the load balancer needs an endpoint to do health checks.
Unfortunately, Consul does not have a specifically-designed API endpoint for this like Vault.
Like you have mentioned, a “random-ish” API path is probably the best workaround as long as the UI is enabled on all of the Consul servers. That way the load balancer will at least send traffic to a currently running instance. I could see an argument for /v1/status/leader being a leading candidate for the “random-ish” path, but at that point I don’t think it matters too much.
This issue seems to reference a similar enhancement, if interested you could +1 the feature request.
I’m not sure that issue is similar at all, since it refers to whether or not a specific node from the nodes API are up or not. This is specifically targeted to if this self instance of a consul ui server is up or not.
I’ve got a similar scenario - a Consul cluster behind and AWS ELB - currently using HTTP:8500/ui/ for the ELB health check, but since upgrading Consul to 1.7.4 (AFAICT) and now 1.8.2 I’m seeing random 504 errors from multiple clients talking to the Consul server via a CNAME to the ELB, which lead me to question what the best health check URL should be.