Use-case for distributed software within company owned devices in the field

jordan-lumley · October 12, 2021, 3:17pm

Hello all,

Question about use of nomad. So I work in a company that is based around Point-Of-Sale in which we need to distribute docker containers amongst other software to company owned distributed devices in the field. Can nomad be a tool to accomplish this?

brucellino1 · October 12, 2021, 3:23pm

For your application to run on the device, it needs at the very least a nomad agent running. So, your device needs a processor architecture that Nomad supports.

Besides that, you will need to consider network connectivity. The agents report to the cluster servers the status of their jobs via health checks, so they will need to be well-connected.

From the sound of it, you want semi-autonomous agents, working in arbitrary environments. The Nomad team can comment more in depth, but I don’t think it would be my first choice for this case.

Curious to hear the details

jordan-lumley · October 12, 2021, 4:05pm

Thanks for the response!

I have been digging through this for a week now obsessively and trying to figure out a path to daylight without just writing my own solution but it keeps circling back towards writing my own.

Curious on what other choices are out there if you know of any?

jordan-lumley · October 12, 2021, 8:13pm

starting to lean towards hashicorp is not what I am looking for… Looks fantastic for cases to deploy to in-house servers or cloud servers but not to agents/clients in the field. I don’t know how something like this doesn’t exist yet?

mnomitch · October 12, 2021, 9:39pm

Hey @jordan-lumley!

While it isn’t the majority of Nomad use, we definitely see people using Nomad to achieve something similar to what you mentioned. And it seems to generally go well.

“Edge” deployments where the client is located far away from the Nomad server cluster, and where connectivity might be lost (then found) is something we’ll likely be spending more time on in the future.

Off the top of my head, I can mention some things to do/not to do

First, I would start with a single Nomad server cluster talking to distant clients. While the servers should have a low latency/consistent connection, the clients can operate pretty well with high latency and inconsistent connections. A single server cluster can scale to thousands of clients. People sometimes jump to federated servers immediately, but I wouldn’t worry about that yet.
Consul usually needs clients to gossip between themselves. This can be an issue for “edge” deployments as not all the clients can connect. If possible, I would avoid Consul for this reason. This will make service discovery trickier, but hopefully you don’t need a full Service Mesh for this use case.
I would think about the stop_after_client_disconnect value you want for your use case - group Stanza - Job Specification | Nomad by HashiCorp
Other spots where you can retry connections or tune for longer heartbeats you should. (Sorry I’m forgetting exactly where now - it might be heartbeat_grace - server Stanza - Agent Configuration | Nomad by HashiCorp)
Worth noting that if the Nomad client does disconnect, it can keep running its workloads, but when it reconnects, it will usually restart its workloads. If the workloads quickly restart this is no big deal, if the workload is slow, you may have to work around this. Gracefully handling reconnection to the cluster is something we’ll be handling in a future version of Nomad. Ideally this isn’t a deal breaker though.
Consider using “datacenters” heavily to segment client nodes. This can make scheduling on particular machines a lot easier. Depending on your use case you might not need this though.

If anything else pops up as a potential blocker, please let me know! We’re very interested in serving use cases like this, so that feedback would be very valuable. Feel free to reach out to mnomitch(at)hashicorp(dot)com with feedback.

jordan-lumley · October 12, 2021, 11:04pm

Thanks so much for this information!! I very much so enjoy the hashistack. I think what you guys are doing is pretty awesome just wanted to hear it from the source to make sure I wasn’t retro-fitting something where it should be retro-fitted.

mnomitch · December 13, 2021, 10:54pm

@jordan-lumley, just curious any luck?

jordan-lumley · December 13, 2021, 11:09pm

Hey @mnomitch, Sorry but no luck. I tried to make it work but ended up stumbling upon balena OS and ran with that. It encompassed the full width of what we needed. But still have been keeping an eye on nomad!