Nomad install: Error connecting to server: context deadline exceeded

Hey everybody,

So today I tried to install Waypoint on my existing Nomad cluster using the available tutorial.
But everytime I try to install it gets stuck at the same point.

Here is my installation process

[chris@desktop]$ waypoint install -plain -platform=nomad -accept-tos -nomad-dc=nl -nomad-host-volume=wp-server-vol -nomad-runner-host-volume=wp-runner-vol -nomad-consul-datacenter=nl -nomad-consul-domain=vanmeer.eu
-> Initializing Nomad client...
-> Checking for existing Waypoint server...
-> Installing Waypoint server to Nomad
-> Waiting for allocation to be scheduled
-> Nomad allocation pending...
-> Nomad allocation created
-> Waiting for allocation "f72700e8-bc7c-14ce-5206-4860a305482d" to start
-> Waiting for allocation "f72700e8-bc7c-14ce-5206-4860a305482d" to start
-> Waiting for allocation "f72700e8-bc7c-14ce-5206-4860a305482d" to start
-> Waiting for allocation "f72700e8-bc7c-14ce-5206-4860a305482d" to start
-> Waiting for allocation "f72700e8-bc7c-14ce-5206-4860a305482d" to start
-> Waiting for allocation "f72700e8-bc7c-14ce-5206-4860a305482d" to start
-> Waiting for allocation "f72700e8-bc7c-14ce-5206-4860a305482d" to start
-> Waiting for allocation "f72700e8-bc7c-14ce-5206-4860a305482d" to start
-> Waiting for allocation "f72700e8-bc7c-14ce-5206-4860a305482d" to start
-> Waiting for allocation "f72700e8-bc7c-14ce-5206-4860a305482d" to start
-> Waiting for allocation "f72700e8-bc7c-14ce-5206-4860a305482d" to start
-> Waiting for allocation "f72700e8-bc7c-14ce-5206-4860a305482d" to start
-> Waiting for allocation "f72700e8-bc7c-14ce-5206-4860a305482d" to start
-> Waiting for allocation "f72700e8-bc7c-14ce-5206-4860a305482d" to start
-> Waiting for allocation "f72700e8-bc7c-14ce-5206-4860a305482d" to start
-> Nomad allocation running
-> Ensuring allocation "f72700e8-bc7c-14ce-5206-4860a305482d" has properly started up...
-> Nomad allocation running
-> Ensuring allocation "f72700e8-bc7c-14ce-5206-4860a305482d" has properly started up...
-> Nomad allocation running
-> Waypoint server ready
The CLI has been configured to automatically install a Consul service for
the Waypoint service backend and ui service in Nomad.
-> Connecting to: waypoint-ui.service.nl.vanmeer.eu:9702
-> Attempting to make connection to server...
-> Error connecting to server: error connecting to server: context deadline exceeded

The Waypoint server has been deployed, but due to this error we were
unable to automatically configure the local CLI or the Waypoint server
advertise address. You must do this manually using "waypoint context"
and "waypoint server config-set".
-> Retry connecting to server ... 0/12 retries: error connecting to server: context deadline exceeded
-> Error connecting to server: error connecting to server: context deadline exceeded

(.. omitting repeating text ..)

The Waypoint server has been deployed, but due to this error we were
unable to automatically configure the local CLI or the Waypoint server
advertise address. You must do this manually using "waypoint context"
and "waypoint server config-set".
-> Retry connecting to server ... 11/12 retries: error connecting to server: context deadline exceeded
-> Error connecting to server: error connecting to server: context deadline exceeded

The Waypoint server has been deployed, but due to this error we were
unable to automatically configure the local CLI or the Waypoint server
advertise address. You must do this manually using "waypoint context"
and "waypoint server config-set".
-> Failed to connect to Waypoint server after max retry attempts of 12
! Error connecting to server: error connecting to server: context deadline exceeded
  
  The Waypoint server has been deployed, but due to this error we were
  unable to automatically configure the local CLI or the Waypoint server
  advertise address. You must do this manually using "waypoint context"
  and "waypoint server config-set".

The Nomad job is healthy, I see the service registration within Consul, but can’t understand what goes wrong.

When I try to connect to the address listed above, from either my desktop (where I am running the install command) or on the actual nomad client that has the allocation for this job (in this case the machine is called docker3), I can resolve and connect to it.

[chris@desktop]$ dig waypoint-ui.service.nl.vanmeer.eu SRV

; <<>> DiG 9.16.23-RH <<>> waypoint-ui.service.nl.vanmeer.eu SRV
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10481
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 3

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;waypoint-ui.service.nl.vanmeer.eu. IN  SRV

;; ANSWER SECTION:
waypoint-ui.service.nl.vanmeer.eu. 0 IN SRV 1 1 9702 0a00fa35.addr.nl.vanmeer.eu.

;; ADDITIONAL SECTION:
0a00fa35.addr.nl.vanmeer.eu. 0 IN A 10.0.250.53
docker3.node.nl.vanmeer.eu. 0 IN TXT  "consul-network-segment="

;; Query time: 2 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Wed Feb 22 10:04:50 CET 2023
;; MSG SIZE  rcvd: 208
[chris@docker3]$ dig waypoint-ui.service.nl.vanmeer.eu SRV

; <<>> DiG 9.16.23-RH <<>> waypoint-ui.service.nl.vanmeer.eu SRV
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3684
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 3

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;waypoint-ui.service.nl.vanmeer.eu. IN  SRV

;; ANSWER SECTION:
waypoint-ui.service.nl.vanmeer.eu. 0 IN SRV 1 1 9702 0a00fa35.addr.nl.vanmeer.eu.

;; ADDITIONAL SECTION:
0a00fa35.addr.nl.vanmeer.eu. 0 IN A 10.0.250.53
docker3.node.nl.vanmeer.eu. 0 IN TXT  "consul-network-segment="

;; Query time: 2 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Wed Feb 22 10:04:25 CET 2023
;; MSG SIZE  rcvd: 208
[chris@desktop]$ curl -s -k https://waypoint-ui.service.nl.vanmeer.eu:9702 | head
<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <title>Waypoint</title>
    <meta name="description" content="" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
[chris@desktop]$ nomad version && waypoint version
Nomad v1.4.4 (7f29429be12098e0f3a09df959d9272aa0654cba)
CLI: v0.10.5 (ece0f7541)

What am I missing here?
By the way, both my Nomad and Consul are mTLS enabled and during install I have a NOMAD_TOKEN, CONSUL_HTTP_TOKEN and NOMAD_ADDR environment variable set (the latter set to https scheme).

Hi @c.v.meer! This should be fixed with our latest 0.11.0 release. Please update your Waypoint version and give this another shot. Please also note that there are some special update steps for this version if you use self-managed Waypoint and persistent storage.

If it still doesn’t work, then feel free to open up an issue on our repo!

1 Like

Thanks @xiaolin-ninja! I will upgrade and give it another go.
Could you tell me where I can find the documentation regarding self-managed WP and persistent storage? I don’t see it referenced in the changelog nor on the docs site.

My bad, found them

Upgraded: the waypoint-server part runs smoothly, but now the waypoint-static-runner has the same error.

Opened up an issue.