Intermittent 503 Liveness and Readiness Probe Health Check

Hi All,
My setup: consul-k8s/mesh: 1.17.x
My app health check endpoint: /api/myapp/v1/health
Every couple of minutes or so I got this error:
Readiness probe failed: HTTP probe failed with statuscode: 503
Liveness probe failed: HTTP probe failed with statuscode: 503

I am confident that my app health endpoint is working and returns status very quickly in term of milliseconds.
I was wondering what caused such a failure and if anyone has experienced the same issue & what the solution was.
Thanks
~K
I turned on Envoy debug and here is what I am seeing in the log:

[debug] envoy.router(26) [Tags: "ConnectionId":"5063","StreamId":"12730984532044615132"] cluster 'local_app' match for URL '/api/myapp/v1/health'

[debug] envoy.router(25) [Tags: "ConnectionId":"5062","StreamId":"4838852820816179654"] pool ready

[debug] envoy.router(26) [Tags: "ConnectionId":"5063","StreamId":"12730984532044615132"] router decoding headers:

':authority', '10.x.x.x:20400'

':path', '/api/myapp/v1/health'

':method', 'GET'

':scheme', 'http'

'user-agent', 'kube-probe/1.28+'

'accept', '*/*'

'x-forwarded-proto', 'http'

'x-request-id', '98b5578e-3a12-4eff-8bbc-e4759fd02e46'

'x-envoy-expected-rq-timeout-ms', '15000'

[debug] envoy.router(26) [Tags: "ConnectionId":"5063","StreamId":"12730984532044615132"] pool ready

[debug] envoy.router(26) [Tags: "ConnectionId":"5063","StreamId":"12730984532044615132"] upstream reset: reset reason: connection termination, transport failure reason:

[debug] envoy.http(26) [Tags: "ConnectionId":"5063","StreamId":"12730984532044615132"] Sending local reply with details upstream_reset_before_response_started{connection_termination}

[debug] envoy.http(26) [Tags: "ConnectionId":"5063","StreamId":"12730984532044615132"] closing connection due to connection close header

[debug] envoy.http(26) [Tags: "ConnectionId":"5063","StreamId":"12730984532044615132"] encoding headers via codec (end_stream=false):

':status', '503'

'content-length', '95'

'content-type', 'text/plain'

'date', 'Wed, 10 Jul 2024 18:22:35 GMT'

'server', 'envoy'

'connection', 'close'

The issue I hit is mentioned here (exact same issue):

Hi All,
I’ve been scratching my head for awhile on this intermittent health probes 503 via consul mesh and just came up with a solution. It would be great if consul experts can give me an idea if this solution is causing any side effect down the road.
Here’s what I did:

  1. Let the kubelet hits the POD’s health probes (liveness and readiness) endpoint directly instead of hitting the Envoy (redirection by mutation):

consul.hashicorp.com/transparent-proxy-overwrite-probes: ‘false’

  1. My health probes exposed via port 8086 so I had to bypass the proxy for incoming request (from kubelet)
    consul.hashicorp.com/transparent-proxy-exclude-inbound-ports: ‘8086’

~K

Hi All,

While my hack solution did its job but it does not address the root cause. I finally found what the root cause is & how to fix it. The source of 503 coming from the local app that terminates the connection after a period of inactivity. Envoy has idle timeout > local App and thus it has no idea. If your app is NodeJs then increase the idle timeout (default to 5s) or lower the periodSeconds of your health probes.

Hope it helps anyone out there hitting the same issue.

~K