"Zombie" allocations in Nomad native service discovery

I’m running a simple 2 node cluster as I evaluate Nomad (1.7.2), and am having some trouble with “zombie” allocations in nomad’s native service discovery. I’m running Traefik as my load balancer, and was noticing for this service that randomly the requests would either work or return a “Bad Gateway” error. Turns out that, even though this service was supposed to only have one allocation, it was load balancing between two. Looking at the “services” tab in the UI, I can see that there are multiple allocations even though there should only be one. Clicking on any of them that isn’t the active one throws an error.

The ip’s listed for the “zombie” allocations are one another node, or another network interface, which may be relevant. I tried running nomad system gc on both nodes but no luck. I also rebooted both nodes, but same story.

Has anyone else run into this issue? Is it a bug with Nomad’s native service discovery or something with my configuration? I was excited to not need to deploy Consul, since it makes the deployment simpler, but would using Consul here fix the issue? Thanks!

Update: I found this bug report which I think is the same issue Services not unregistered · Issue #16616 · hashicorp/nomad · GitHub

This workaround, while obviously not ideal, does work for now. GitHub - icyleaf/nomad-invalid-services-cleaner: Auto clean all invalid zombie nomad service(s)