Consul service mesh, clarification around request timeout

dbd · January 26, 2024, 1:42pm

Hi.
I’m running a Nomad/Consul/Vault cluster for a number of services, with service mesh used for almost all inter-service communications. Things mostly works, but I have some problems when using HTTP services through the mesh (with Protocol = “http” in service-defaults) and requests taking more than the default limit of 15s.

My HTTP client keeps getting “upstream request timeout” error on some queries. Looking at the docs, there’s a lot of place where request timeout can be configured, but it’s never clear which settings affects which timeout, as there are several components involved :

Initial client to local envoy
Envoy to envoy
Envoy to final server

The available settings

local_request_timeout_ms set either in a proxy-defaults config, like

Kind = "proxy-defaults"
Name = "global"
Config {
  local_request_timeout_ms = 0
}

or in the job

      connect {
        sidecar_service {
          proxy {
            config {
              local_request_timeout_ms = 0
            }
        }
      }

RequestTimeout in a service-resolver configuration, like documented here
RequestTimeout in a service-router configuration, like documented here

Can anyone clarify a bit how, and where to configure a bigger request timeout (which of those configuration should be applied, and on the source service, or the destination service)

Ranjandas · January 29, 2024, 12:11am

Hi @dbd,

To answer this question, let’s take two services, frontend and backend. To avoid getting upstream request timeout, you will have to apply the timeout configurations in two places.

Envoy to Envoy: This can be configured using a service-router or service-resolver for the upstream service. So in our example, we should apply it for the backend service. The simplest option is to have a service-resolver as shown below.
```
Kind           = "service-resolver"
Name           = "backend"
RequestTimeout = "17s" # This will be applied to the envoy of the downstream service.
```
After you apply this, you will find that this timeout will get applied to the Envoy proxy of the downstream service of backend (in this case , frontend).
Envoy to the final server (local_app): In addition to the above, it is important that the local_request_timeout_ms is adjusted, as this is where the delay is coming from. Applying this would configure the upstream services (`backend) envoy to wait for the application.
```
Kind      = "service-defaults"
Name      = "backend"
Protocol  = "HTTP"
LocalRequestTimeoutMs = "17000" # 17 seconds timeout
```

It should be noted that the request timeout will depend on the minimum value set to one of these.

I hope this helps.

dbd · January 29, 2024, 8:06am

Thanks, so everything has to be configured on the backend service. I’ll take another look at what I did because I think I did what you explain, but still have upstream request timeout with requests in the ~15s range

dbd · January 30, 2024, 8:16am

Everything is working as expected, thanks for your clarifications. My problem was with webservice calls to a second backend, for which no service-resolver was configured, so using the default 15s timeout. Once identified and fixed, I get the desired behavior.

system · May 21, 2024, 4:34pm

This topic was automatically closed 62 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Consul API Gateway timeouts after 15s (request timeout) Consul	1	460	May 30, 2024
Consul envoy configuration NOT getting updated with new request timeout, connection timeout Consul envconsul	0	508	December 13, 2022
Envoy/Consul Connect - upstream request timeout Nomad	4	3566	June 22, 2021
ConsulAgentError when deploying service router Consul k8s , service-mesh , consul	1	335	March 24, 2023
Upstream connection timeout does not work Consul envconsul , connect	7	1959	November 29, 2021

Consul service mesh, clarification around request timeout

Related topics