Consul service deregistration upon job reallocation

I have encountered a behavior that feels unexpected to me, and I’m wondering if I am doing something wrong which is causing service deregistrations not to work correctly.

My setup is essentially a Consul cluster (3 GCE VMs), a Nomad cluster (3 GCE VMs), and 3 Nomad clients (3 GCE VMs).

Step 1: Bring cluster components up, all service checks healthy.
Step 2: Run countdash Nomad-Connect demo from HashiCorp Learn.
Step 3: Verify count-dashboard/count-api services healthy in Consul.
Step 4: Manually stop 2 Nomad clients. Job evaluation causes allocations on the remaining Nomad client.

What happened: 2 instances of count-dashboard and 2 instances of count-api, one pair unhealthy and orphaned, one pair healthy on reallocated node.

What I expected: 1 instance pair of count-dashboard and count-api, with service checks on lost allocations deregistered.

I read another thread about this, and it seems like people were manually deregistering services from the catalog, and this is not something I would expect to have to manually do.

Could someone point me in the right direction?

Thanks

Hi @AdrienneCohea,

This is something I’d like to test drive. Since you are playing around with countdash, do you have a public repo with the reproduction config?

Sorry, my configuration is a bit too complicated to give a quick reproduction (it’s Terraform states across four repositories). I’ll see if I can reproduce it again on my and eventually get to a simpler reproduction.

Is it okay to let the thread go stale until I do?