Nomad 0.11.1 Client Error: node secret ID does not match

All:

I’m trying to setup a 3 node Consul/Nomad cluster using LXD containers - node1 is a Consul and Nomad server, nodes 2 and 3 are Consul and Nomad clients.

Node 2 registers. However I can’t get node 3 to register. DEBUG log message is:

[ERROR] client: error registering: error=“rpc error: node secret ID does not match. Not registering node.”

To the best of my knowledge, node 2 and 3 are configured (except for ip addresses) identically. ACL is not turned on as is Vault integration.

I have deleted and re-initialized the cluster several times with no change in behavior.

I did see a post on shutting down the node, and running hitting the GC API point, which did not work.

Consul shows all 3 nodes as registered.

Would appreciate any help!

-steve

ADDENDUM:

I modified the client.hcl for the client reporting the problem and changing no_host_uuid from false to true. This appears to have solved the secret id not matching error.

But I find it bizarre that this setting is needed on one container, but not the other.

Any insight would be much appreciated - I dislike “magic” settings.

-steve

Hi @snesbittsea! I suspect the problem is that the fingerprint of the client is overlapping with the other client running on the same host, and that’s causing a collision in the host ID.

The underlying issue here is most likely that you’re running the clients as LXD containers. The Nomad client needs access to privileged APIs on the host (ex CAP_SYS_ADMIN). While it’s probably possible to get it to work(-ish) with clever enough configuration, running a Nomad client as a container isn’t a supported configuration.

Thanks for the info. This is strictly a dev cluster.

So far things are working with no_host_uuid option. I will also see what happens when I change the LXD containers to unprivileged.

I guess I could go with VMs, but I just find them too heavyweight.

-steve

For what it’s worth, you should find that running the servers in containers works out just fine, as they don’t need to talk to the host OS.

I will also see what happens when I change the LXD containers to unprivileged.

Just out of curiosity, what task drivers are you using that you can run them from within LXD?

I’m using the docker driver. Was able to get nomad to place an instance of traefik 2.2 on a Nomad client.

-steve