Every times I reboot my cluster nomad/consul/vault. I have this error on job, especially these whose used consul connect.
failed to setup alloc: pre-run hook "network" failed: failed to configure networking for alloc: failed to configure network: failed to set bridge addr: could not set bridge's mac: invalid argument
or
failed to setup alloc: pre-run hook "network" failed: failed to configure networking for alloc: failed to configure network: failed to allocate for range 0: 172.26.64.101 has been allocated to 987e3755-f6e9-2c97-9310-106d433e9182, duplicate allocation is not allowed
Sometimes I need to restart nomad service or sometimes waiting. But waiting 30 minutes or 1 hour and suddenly after many auto-restart of job it works!
I didn’t find something relevant on logs.
Can I have just an explanation about this to find maybe a solution. Thanks!
Could you share your jobspec file and any network configuration from your server/client hcl files? Both of these errors appear to be coming from the call to containerd/go-cni/cni.Setup. I might be able to spot something if I can see your config.
This value in the server.hcl file stands out to me because it matches the time interval you are witnessing. I’m wondering if this is having an unexpected effect. This line checks the oldThreshold variable which is computed a few lines earlier.
Could you try setting that value a bit lower and seeing if it has an impact? I’m not suggesting this is a fix, but I’m hoping since you have this environment up and running, it will be faster/easier for you to test the theory.