Cluster is unhealthy after weekly patching/reboot

My Nomad cluster comes up fine but the consul does not after weekly patching and reboot .
They all come up with 2-3 minutes but they go into a bad state and I have to restart them manually …

Observsation: My Nomad cluster works fine and jobs run fine however I cannot get into consul UI and when I so “systemctl status consul” it does throw some error . Only a restart of all 3 nodes makes it normal.
Here is the error …I think Consul agents shut themselves down since there is no cluster leader ?

}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->xxxxx:8303: operation was canceled"
2023-05-10T22:38:25.889-0400 [WARN]  agent: [core][Channel #1 SubChannel #6588] grpc: addrConn.createTransport failed to connect to {
  "Addr": "xxxx:8303",
  "ServerName": "xxxxxx",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->xxxx:8303: operation was canceled"
2023-05-10T22:40:39.292-0400 [INFO]  agent: Deregistered service: service=_nomad-server-jox3pd5xjcasphhcxmrqk2tk5ymoieo7
2023-05-10T22:40:39.364-0400 [INFO]  agent: Caught: signal=interrupt
2023-05-10T22:40:39.364-0400 [INFO]  agent: Graceful shutdown disabled. Exiting
2023-05-10T22:40:39.364-0400 [INFO]  agent: Requesting shutdown
2023-05-10T22:40:39.385-0400 [INFO]  agent.server: shutting down server
2023-05-10T22:40:39.385-0400 [WARN]  agent.server.serf.lan: serf: Shutdown without a Leave
2023-05-10T22:40:39.388-0400 [WARN]  agent.server.serf.wan: serf: Shutdown without a Leave
2023-05-10T22:40:39.389-0400 [INFO]  agent.router.manager: shutting down
2023-05-10T22:40:39.389-0400 [INFO]  agent.router.manager: shutting down
2023-05-10T22:40:39.403-0400 [INFO]  agent: consul server down
2023-05-10T22:40:39.403-0400 [INFO]  agent: shutdown complete
2023-05-10T22:40:39.403-0400 [INFO]  agent: Stopping server: protocol=DNS address=0.0.0.0:8600 network=tcp
2023-05-10T22:40:39.403-0400 [WARN]  agent.cache: handling error in Cache.Notify: cache-type=connect-ca-root error="rpc error making call: EOF" index=9
2023-05-10T22:40:39.404-0400 [INFO]  agent: Stopping server: protocol=DNS address=0.0.0.0:8600 network=udp
2023-05-10T22:40:39.404-0400 [INFO]  agent: Stopping server: address=[::]:8501 network=tcp protocol=https
2023-05-10T22:40:39.404-0400 [WARN]  agent: Deregistering service failed.: service=_nomad-server-sqq5ryvwb6hlkab3ymogypudt7ckxbge error="No cluster leader"
2023-05-10T22:40:39.404-0400 [ERROR] agent: failed to sync changes: error="No cluster leader"
2023-05-10T22:40:39.404-0400 [INFO]  agent: Stopping server: address=127.0.0.1:8500 network=tcp protocol=http
2023-05-10T22:40:39.406-0400 [INFO]  agent: Waiting for endpoints to shut down
2023-05-10T22:40:39.406-0400 [INFO]  agent: Endpoints down
2023-05-10T22:40:39.406-0400 [INFO]  agent: Exit code: code=1

Any advise ?

There is not enough information here to give much advice.

The logging you have shown looks like nothing more than the Consul agent being shut down for the reboot to me, and never being restarted at all after the reboot.

No, they don’t do that. The cause of the shutdown above is this externally delivered signal:

Thanks . After looking deeper into the issue I see that after consul tries several times and quitting …the Baremetal is still running some network scripts . I am going to add a dependency check in the systemd unit file to wait until all the network and other things come up before we start Nomad and consul