Nomad Client allocation discovery

stswidwinski · September 15, 2022, 5:12pm

Hi,

I’ve a question regarding limiting the frequency of communicating with the Nomad Server when issuing queries against the daemons running on the Nomad Client machines. When using a localhost http client to communicate with the local Nomad node, I can expect a subset of RPC calls to not leave the localhost (as defined here: nomad/rpc.go at main · hashicorp/nomad · GitHub). This allows me to get information about a particular allocation (by its ID) present on the Nomad node such as resource usage and similar.

However, I believe that this set of calls does not allow me (or at least I can’t find a way to do so) to discover the allocations that are known to the local node at the time of the query, forcing discovery to go through the Nomad Server (either a replica if we allow stale data or a master if we do not as defined in the routing policy: nomad/rpc.go at 3f67b5b8ebd78673e2f431f7822f60af53a6efea · hashicorp/nomad · GitHub).

My question is thus: is there a way to discover the allocations known to the local node without leaving the node itself (going to either master or a replica of the nomad server)?

Why would I want to do this?

Removing the Nomad Scheduler from the equation allows us to be more resilient and avoid overloading the scheduler itself. For instance:

Node-bound components can easily DDoS a scheduler if synchronization occurs (each node issuing a query for the local node ID)
Scheduler-bound queries simply do not work during a network partition
Node-local components may be partitioned from the scheduler, but not from other components (such as logging or metrics ingestion nodes)

This applies to any operation which is downstream from the Nomad Client API (such as exporting metrics).

Much appreciated!

— EDIT —

This data is obtainable by leveraging the metrics http endpoint since each alloc_id has metrics attached to it, but parsing that json seems like a rather… incorrect way of obtaining this data

jrasell · September 16, 2022, 7:36am

Hi @stswidwinski, this is an interesting idea. There is currently no way to list the allocations on a client by querying the client directly, all queries must be responded to by a server. Servers in this case are the source of truth.

That being said, each client does track which allocations it currently has which is persisted state, so this would be technically possible.

Node-local components may be partitioned from the scheduler

This is an immediate “problem” I thought of. As you state this is acceptable if the listing is used for certain activities, however, we as the maintainers cannot assume this would be the only way it would be used. I do think there is value in discussing this further though and getting more visibility on the request. Could you please raise this as a feature request against the Nomad repository? This would allow more community members and engineers to see the request and add their thoughts.

Thanks,
jrasell and the Nomad team

stswidwinski · September 16, 2022, 11:48am

Done! Allow listing of client-local allocations / tasks / task groups. · Issue #14605 · hashicorp/nomad · GitHub.

Thanks for the response and I’ll stay in touch!

Topic		Replies	Views
Autoscaler and bounds nop scaling Nomad	0	149	July 31, 2023
Number of nodes as seen by scheduler Nomad	2	378	October 31, 2022
[ask][nomad] Node Id and Allocation behavior if Nomad Clients instance IP Change Nomad	1	361	July 21, 2021
Autoscaler: Drain vs Allocation Nomad nomad	0	252	July 3, 2023
Nomad allocations placement Nomad nomad	2	189	March 13, 2024

Nomad Client allocation discovery

Why would I want to do this?

Related topics