I’ve a question regarding limiting the frequency of communicating with the Nomad Server when issuing queries against the daemons running on the Nomad Client machines. When using a
localhost http client to communicate with the local Nomad node, I can expect a subset of RPC calls to not leave the localhost (as defined here: nomad/rpc.go at main · hashicorp/nomad · GitHub). This allows me to get information about a particular allocation (by its ID) present on the Nomad node such as resource usage and similar.
However, I believe that this set of calls does not allow me (or at least I can’t find a way to do so) to discover the allocations that are known to the local node at the time of the query, forcing discovery to go through the Nomad Server (either a replica if we allow stale data or a master if we do not as defined in the routing policy: nomad/rpc.go at 3f67b5b8ebd78673e2f431f7822f60af53a6efea · hashicorp/nomad · GitHub).
My question is thus: is there a way to discover the allocations known to the local node without leaving the node itself (going to either master or a replica of the nomad server)?
Removing the Nomad Scheduler from the equation allows us to be more resilient and avoid overloading the scheduler itself. For instance:
- Node-bound components can easily DDoS a scheduler if synchronization occurs (each node issuing a query for the local node ID)
- Scheduler-bound queries simply do not work during a network partition
- Node-local components may be partitioned from the scheduler, but not from other components (such as logging or metrics ingestion nodes)
This applies to any operation which is downstream from the Nomad Client API (such as exporting metrics).
— EDIT —
This data is obtainable by leveraging the
metrics http endpoint since each alloc_id has metrics attached to it, but parsing that json seems like a rather… incorrect way of obtaining this data