Hello,
we are testing Waypoint to understand if it could be the right tool for us to manage the deployment of some simple docker applications to several remote servers.
goal
We are trying to use remote runners on the onprem remote servers and a single central waypoint server installation (with a public ip), reachable by all the runners.
We managed to have the components running and connected, but I am clearly doing something wrong at the configuration level and I hope someone can point me to the right direction.
I am not able to assign a project to the correct runner.
setup
To test this we are using a two simple waypoint.hcl projects that runs a nginx with no additional configuration, stored in a git repo, in different folders.
I’ll describe here the components, steps, and configurations that we used.
I’m sorry: it’s going to be a long post.
Components involved:
-
WS: the central waypoint server, with no runners
-
R0: remote server 0 with remote runner, should run job0
-
R1: remote server 1 with remote runner, should run job1
-
waypoint.hcl file for job0 (job1 is the same)
project = "job0" app "web" { runner { enabled = true } build { use "docker-pull" { image = "nginx" tag = "latest" } } deploy { use "docker" { } } }
server installation
We installed Waypoint server on WS following the server installation documentation page and it works fine.
We also applied the following conf to it
waypoint server config-set -advertise-addr=11.22.33.44:9701 -advertise-tls-skip-verify=true
setting and verifying the context on a local CLI also works fine (adding the -server-tls-skip-verify
option).
remote runners installation
We installed waypoint and started the runners from the terminal,on R0 and R1, doing the following:
on R0 and R1
- export WAYPOINT_SERVER_TOKEN=12345654321
- export WAYPOINT_SERVER_TLS_SKIP_VERIFY=1
- export WAYPOINT_SERVER_TLS=11.22.33.44:9701
on R0
- waypoint runner agent -id job0
on R1
- waypoint runner agent -id job1
Adopt the runners on waypoint server WS:
- waypoint runner adopt job0
- waypoint runner adopt job1
Now the runners are runner registered with server and ready
and waiting for job assignment
project and runner profiles creation
We now create two projects job0 and job1, with a running profile associated.
all these steps are performed from a local cli with the correct context.
-
create the runner profiles for job0 and job1
waypoint runner profile set -name job0 -target-runner-id=job0 -plugin-type=docker
waypoint runner profile set -name job1 -target-runner-id=job1 -plugin-type=docker
$ waypoint runner profile list Runner profiles NAME | PLUGIN TYPE | OCI URL | TARGET RUNNER | DEFAULT ---------+-------------+-------------------------------+---------------+---------- docker | docker | hashicorp/waypoint-odr:latest | * | yes job0 | docker | hashicorp/waypoint-odr:latest | job0 | job1 | docker | hashicorp/waypoint-odr:latest | job1 |
-
create the job0 and job1 projects with the running profile
waypoint project apply \ -data-source=git \ -git-auth-type=basic \ -git-username="a@b.c" \ -git-password='secret' \ -git-url=https://gitlab.com/aa/bb/cc.git \ -git-path=job0/ \ -poll \ -app-status-poll \ -runner-profile=job0 \ job0
waypoint project apply \ -data-source=git \ -git-auth-type=basic \ -git-username="a@b.c" \ -git-password='secret' \ -git-url=https://gitlab.com/aa/bb/cc.git \ -git-path=job1/ \ -poll \ -app-status-poll \ -runner-profile=job1 \ job1
we can correctly see the runners profile using the cli, and the projects are shown with the correct settings in the UI (although it seems there is no visual reference to the runner profiles )
test
Now that everything is in place we want to run job0 and job1 from the cli and we expect:
- the jobs to be run remotely
- job0 to be deployed on R0, using runner with id job0
- job1 to be deployed on R1, using runner with id job1
what we get:
- the job is assigned to a random runner with profile “docker”
Performing operation on "docker" with runner profile "docker"
- running
waypoint up
multiple times spawns multiple instances of the app on the runners - running
waypoint destroy
kills only some apps (the one on the server with the runner that got the job), and it leaves orphan containers
It looks like the project and runner association is not working, and that the app lifeclycle gets messed up with this configuration.
What am I missing?
Am I using it in a way it is not supposed to or I’m just doing something wrong?
extra
While testing different configurations for the runners I think I found a reproducible way to kill the server (similar to issue 3051 maybe?).
- setup the env described above
- forget runner job1
waypoint runner forget job
- crtl-C the job1 runner process (here are the logs, you can spot my ^C at the beginning of the second line)
2022-05-18T11:52:09.232Z [WARN] waypoint.runner.agent.runner: server down before accepting a job, will reconnect ^C2022-05-18T11:52:20.847Z [INFO] waypoint.runner.agent: quit request received, gracefully stopping runner 2022-05-18T11:52:20.847Z [ERROR] waypoint.runner.agent: error running job: err="rpc error: code = Internal desc = early exit while waiting for reconnect" 2022-05-18T11:52:20.848Z [WARN] waypoint.runner.agent.runner.config.watcher: exiting due to context ended 2022-05-18T11:52:20.848Z [WARN] waypoint.runner.agent.runner.config_recv: EOF or cancellation received, graceful close of runner config stream
- the server is dead with the following logs
2022-05-18T11:52:19.468Z [INFO] waypoint.server.grpc: /hashicorp.waypoint.Waypoint/GetVersionInfo response: error=<nil> duration=454.272µs 2022-05-18T11:52:19.514Z [INFO] waypoint.server.grpc: /hashicorp.waypoint.Waypoint/ListRunners request 2022-05-18T11:52:19.515Z [INFO] waypoint.server.grpc: /hashicorp.waypoint.Waypoint/ListRunners response: error=<nil> duration=1.008027ms panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x24d7745] goroutine 4747 [running]: github.com/hashicorp/waypoint/internal/server/boltdbstate.(*State).runnerOffline(0xc00003e640, 0xc000ba55b0, 0xc000ba55e0, {0xc0017ab48c, 0x4}) /tmp/wp-src/internal/server/boltdbstate/runner.go:322 +0x85 github.com/hashicorp/waypoint/internal/server/boltdbstate.(*State).RunnerOffline.func1(0xc000ba5618) /tmp/wp-src/internal/server/boltdbstate/runner.go:148 +0x2f go.etcd.io/bbolt.(*DB).Update(0xc000666640, 0xc000ba56a8) /go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:741 +0x82
The server needs to be reinstalled with the waypoint install
command, and it detects a previous installation, but it leaves you with “2 default runner profiles”, not (still?) manageble from the CLI
$ waypoint runner profile list
Runner profiles
NAME | PLUGIN TYPE | OCI URL | TARGET RUNNER | DEFAULT
---------+-------------+-------------------------------+---------------+----------
docker | docker | hashicorp/waypoint-odr:latest | * | yes
job0 | docker | hashicorp/waypoint-odr:latest | job0 |
job1 | docker | hashicorp/waypoint-odr:latest | job1 |
docker | docker | hashicorp/waypoint-odr:latest | * | yes