Delay SIGTERM of envoy sidecar proxy

Hi guys,

we are using the consul helm chart and are facing issues when doing rolling updates of our nginx gateway. As far as I can see the problem is that the Loadbalancer still sends requests to the pod even after termination is started (Lost requests when doing a rolling update · Issue #43576 · kubernetes/kubernetes · GitHub ). For the nginx container I was able to solve this issue by adding a preStop command like this

                command: ["/bin/sh", "-c", "sleep 5 && /usr/sbin/nginx -s quit"]

The problem now is that we are still getting 502 errors from nginx because the requests to upstream services which are send through the envoy sidecar proxy are failing because it is already shutting down.

Now the questions is how can the SIGTERM of the envoy sidecar be delayed when using the consul helm chart? I only found envoyExtraArgs option but there seems to be no option here Command line options — envoy 1.18.0-dev-fce386 documentation.

Istio seems to have the same issue and there are some possible workarounds

Best regards,

Hi Nico,
This definitely seems like something we should fix. Do you mind opening up an issue in the consul-helm repo?

Hi @lkysow,

I am still trying to fully understand this. So currently, envoy has a preStop hook which deregisters the service which sounds good to me but this finishes basically instantly and afterwards k8s sends a SIGTERM to envoy and I guess at this point it will reject any further incoming requests.

I feel like envoy should handle outbound requests of the proxied service as long as the service is still running, e.g. if the service does some cleanup or still needs to send data when terminating. I found this example "while [ $(netstat -plunt | grep tcp | grep -v envoy | wc -l | xargs) -ne 0 ]; do sleep 1; done" which basically delays the shutdown process of envoy until there are no more tcp listeners, this seems a bit hacky though.

I will do some more testing and to verify the behavior and then create an issue on github

Yeah I think this seems like the best behaviour. Sounds good.

@lkysow I created an issue on the consul-helm repo, see Envoy sidecar shutting down to early causes requests to fail · Issue #866 · hashicorp/consul-helm · GitHub