How do I get Nomad+Consul to drop expired services?

Hey there! I’ve got a Nomad + Consul cluster running. Works fine for the most part, but I have this strange issue that Nomad doesn’t seem to be deregistering services that are no longer relevant.

For example: I did a smoke test of the system by running an example Redis job. That was registered in Consul as expected, showing that the keys for the ACL subsystem seem to have the right permissions. But when I remove the job, the service and associated instance stick around in Consul without being deregistered.

In another instance, I’m running the countdash example and have modified several things about it but the old service instances remain registered in Consul, so the dashboard flakes out when it gets routed to a instance that no longer responds.

I’m really not sure what to make of this, and nothing I’ve tried so far (using the CLI and API to remove the services manually) has helped.

What could be going on here?

Of course writing out this post clarified something for me somehow and I’ve fixed it.

Problem was that my anonymous Consul role only had read permissions by default, and the agent was trying to use it to remove services since I hadn’t specified another token in the acl.tokens.agent configuration. Once I set up a proper ACL for the agent the services went away.

I found out that this was the problem by searching around here and looking in my Consul logs. Once I knew where to look, Consul told me pretty quickly that it was continuously failing. :sweat_smile:

Gotta get some log slurping and alerts set up on this cluster!

1 Like

This topic was automatically closed 62 days after the last reply. New replies are no longer allowed.