Delay SIGTERM of envoy sidecar proxy

nflaig · March 14, 2021, 2:31pm

Hi guys,

we are using the consul helm chart and are facing issues when doing rolling updates of our nginx gateway. As far as I can see the problem is that the Loadbalancer still sends requests to the pod even after termination is started (Lost requests when doing a rolling update · Issue #43576 · kubernetes/kubernetes · GitHub ). For the nginx container I was able to solve this issue by adding a preStop command like this

          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 5 && /usr/sbin/nginx -s quit"]

The problem now is that we are still getting 502 errors from nginx because the requests to upstream services which are send through the envoy sidecar proxy are failing because it is already shutting down.

Now the questions is how can the SIGTERM of the envoy sidecar be delayed when using the consul helm chart? I only found envoyExtraArgs option but there seems to be no option here Command line options — envoy 1.18.0-dev-fce386 documentation.

Istio seems to have the same issue and there are some possible workarounds

Best regards,
Nico

lkysow · March 14, 2021, 6:51pm

Hi Nico,
This definitely seems like something we should fix. Do you mind opening up an issue in the consul-helm repo?

nflaig · March 14, 2021, 7:42pm

Hi @lkysow,

I am still trying to fully understand this. So currently, envoy has a preStop hook which deregisters the service which sounds good to me but this finishes basically instantly and afterwards k8s sends a SIGTERM to envoy and I guess at this point it will reject any further incoming requests.

I feel like envoy should handle outbound requests of the proxied service as long as the service is still running, e.g. if the service does some cleanup or still needs to send data when terminating. I found this example "while [ $(netstat -plunt | grep tcp | grep -v envoy | wc -l | xargs) -ne 0 ]; do sleep 1; done" which basically delays the shutdown process of envoy until there are no more tcp listeners, this seems a bit hacky though.

I will do some more testing and to verify the behavior and then create an issue on github

lkysow · March 15, 2021, 6:26pm

Yeah I think this seems like the best behaviour. Sounds good.

nflaig · March 17, 2021, 9:01pm

@lkysow I created an issue on the consul-helm repo, see Envoy sidecar shutting down to early causes requests to fail · Issue #866 · hashicorp/consul-helm · GitHub

Topic		Replies	Views
Very frustrated and about to give up on Consul in K8s for much more Ops Friendly Istio Consul k8s , connect	1	557	March 3, 2021
Sidecar consul envoy not balancing traffic Consul	4	459	July 23, 2021
Sidecar injected pod takes 90+ seconds to become ready Consul	2	398	June 14, 2022
Upstream request timeout on consul connect(envoy) with helm chart Consul	1	533	March 24, 2023
Latest consul-k8s 1.16.0 Crashed After About 10 hours Consul k8s	3	218	July 17, 2023

Delay SIGTERM of envoy sidecar proxy

Related topics