Nomad Client allocation discovery


I’ve a question regarding limiting the frequency of communicating with the Nomad Server when issuing queries against the daemons running on the Nomad Client machines. When using a localhost http client to communicate with the local Nomad node, I can expect a subset of RPC calls to not leave the localhost (as defined here: nomad/rpc.go at main · hashicorp/nomad · GitHub). This allows me to get information about a particular allocation (by its ID) present on the Nomad node such as resource usage and similar.

However, I believe that this set of calls does not allow me (or at least I can’t find a way to do so) to discover the allocations that are known to the local node at the time of the query, forcing discovery to go through the Nomad Server (either a replica if we allow stale data or a master if we do not as defined in the routing policy: nomad/rpc.go at 3f67b5b8ebd78673e2f431f7822f60af53a6efea · hashicorp/nomad · GitHub).

My question is thus: is there a way to discover the allocations known to the local node without leaving the node itself (going to either master or a replica of the nomad server)?

Why would I want to do this?

Removing the Nomad Scheduler from the equation allows us to be more resilient and avoid overloading the scheduler itself. For instance:

  1. Node-bound components can easily DDoS a scheduler if synchronization occurs (each node issuing a query for the local node ID)
  2. Scheduler-bound queries simply do not work during a network partition
  3. Node-local components may be partitioned from the scheduler, but not from other components (such as logging or metrics ingestion nodes)

This applies to any operation which is downstream from the Nomad Client API (such as exporting metrics).

Much appreciated!

— EDIT —

This data is obtainable by leveraging the metrics http endpoint since each alloc_id has metrics attached to it, but parsing that json seems like a rather… incorrect way of obtaining this data

Hi @stswidwinski, this is an interesting idea. There is currently no way to list the allocations on a client by querying the client directly, all queries must be responded to by a server. Servers in this case are the source of truth.

That being said, each client does track which allocations it currently has which is persisted state, so this would be technically possible.

Node-local components may be partitioned from the scheduler

This is an immediate “problem” I thought of. As you state this is acceptable if the listing is used for certain activities, however, we as the maintainers cannot assume this would be the only way it would be used. I do think there is value in discussing this further though and getting more visibility on the request. Could you please raise this as a feature request against the Nomad repository? This would allow more community members and engineers to see the request and add their thoughts.

jrasell and the Nomad team

Done! Allow listing of client-local allocations / tasks / task groups. · Issue #14605 · hashicorp/nomad · GitHub.

Thanks for the response and I’ll stay in touch! :slight_smile:

1 Like