Consul service mesh, clarification around request timeout

Hi.
I’m running a Nomad/Consul/Vault cluster for a number of services, with service mesh used for almost all inter-service communications. Things mostly works, but I have some problems when using HTTP services through the mesh (with Protocol = “http” in service-defaults) and requests taking more than the default limit of 15s.

My HTTP client keeps getting “upstream request timeout” error on some queries. Looking at the docs, there’s a lot of place where request timeout can be configured, but it’s never clear which settings affects which timeout, as there are several components involved :

  • Initial client to local envoy
  • Envoy to envoy
  • Envoy to final server

The available settings

  • local_request_timeout_ms set either in a proxy-defaults config, like
Kind = "proxy-defaults"
Name = "global"
Config {
  local_request_timeout_ms = 0
}

or in the job

      connect {
        sidecar_service {
          proxy {
            config {
              local_request_timeout_ms = 0
            }
        }
      }
  • RequestTimeout in a service-resolver configuration, like documented here
  • RequestTimeout in a service-router configuration, like documented here

Can anyone clarify a bit how, and where to configure a bigger request timeout (which of those configuration should be applied, and on the source service, or the destination service)

Hi @dbd,

To answer this question, let’s take two services, frontend and backend. To avoid getting upstream request timeout, you will have to apply the timeout configurations in two places.

  1. Envoy to Envoy: This can be configured using a service-router or service-resolver for the upstream service. So in our example, we should apply it for the backend service. The simplest option is to have a service-resolver as shown below.

    Kind           = "service-resolver"
    Name           = "backend"
    RequestTimeout = "17s" # This will be applied to the envoy of the downstream service.
    

    After you apply this, you will find that this timeout will get applied to the Envoy proxy of the downstream service of backend (in this case , frontend).

  2. Envoy to the final server (local_app): In addition to the above, it is important that the local_request_timeout_ms is adjusted, as this is where the delay is coming from. Applying this would configure the upstream services (`backend) envoy to wait for the application.

    Kind      = "service-defaults"
    Name      = "backend"
    Protocol  = "HTTP"
    LocalRequestTimeoutMs = "17000" # 17 seconds timeout
    

It should be noted that the request timeout will depend on the minimum value set to one of these.

I hope this helps.

1 Like

Thanks, so everything has to be configured on the backend service. I’ll take another look at what I did because I think I did what you explain, but still have upstream request timeout with requests in the ~15s range

Everything is working as expected, thanks for your clarifications. My problem was with webservice calls to a second backend, for which no service-resolver was configured, so using the default 15s timeout. Once identified and fixed, I get the desired behavior.

1 Like