Optimizing blocking requests

We’re using consul and nomad extensively at Fly.io, specifically blocking queries.

To keep a synchronized state in our proxy, we need to run ~ 50 concurrent blocking queries at all times. This puts some stress on consul and our own proxy.

We’ve implemented a client that follows the guidelines presented in the documentation (https://www.consul.io/api/features/blocking.html). However, it’s unrealistic to use a global rate limiter for requests to consul.

We’re trying to reduce the number of requests we do, here’s the rundown of the requests we’re doing:

  • 3 x blocking query on kv prefixes (some of which might return 2-3MBs of JSON, currently)
  • 1 x blocking query on the any health check list endpoint
  • n x blocking query to get services on each node (that’s currently 45+, but will grow much more!)

I’ve added a rate limiter per blocking query we’re doing, but that’s not enough. These queries update very often as nomad allocs are spun up and down.

Is there a endpoint I missed to get all service instances from all nodes? The catalog only returns the service names which is not helpful in our case. We need to get the whole service object.

Our current thinking is we’ll want to ingest that from a centralized server and fanout through a queue. We we’re hoping we wouldn’t have to do that since it seemed like an extraneous layer with consul clients providing similar functionality.