Cannot configure project with specific remote runner

chebelom · May 18, 2022, 2:28pm

Hello,
we are testing Waypoint to understand if it could be the right tool for us to manage the deployment of some simple docker applications to several remote servers.

goal

We are trying to use remote runners on the onprem remote servers and a single central waypoint server installation (with a public ip), reachable by all the runners.

We managed to have the components running and connected, but I am clearly doing something wrong at the configuration level and I hope someone can point me to the right direction.
I am not able to assign a project to the correct runner.

setup

To test this we are using a two simple waypoint.hcl projects that runs a nginx with no additional configuration, stored in a git repo, in different folders.

I’ll describe here the components, steps, and configurations that we used.
I’m sorry: it’s going to be a long post.

Components involved:

WS: the central waypoint server, with no runners
R0: remote server 0 with remote runner, should run job0
R1: remote server 1 with remote runner, should run job1

waypoint.hcl file for job0 (job1 is the same)

project = "job0"

app "web" {
    runner {
        enabled = true
    }

    build {
        use "docker-pull" {
        image = "nginx"
        tag   = "latest"
        }
    }

    deploy {
        use "docker" {
        }
    }
}

server installation

We installed Waypoint server on WS following the server installation documentation page and it works fine.
We also applied the following conf to it
waypoint server config-set -advertise-addr=11.22.33.44:9701 -advertise-tls-skip-verify=true

setting and verifying the context on a local CLI also works fine (adding the -server-tls-skip-verify option).

remote runners installation

We installed waypoint and started the runners from the terminal,on R0 and R1, doing the following:

on R0 and R1

export WAYPOINT_SERVER_TOKEN=12345654321
export WAYPOINT_SERVER_TLS_SKIP_VERIFY=1
export WAYPOINT_SERVER_TLS=11.22.33.44:9701

on R0

waypoint runner agent -id job0

on R1

waypoint runner agent -id job1

Adopt the runners on waypoint server WS:

waypoint runner adopt job0
waypoint runner adopt job1

Now the runners are runner registered with server and ready and waiting for job assignment

project and runner profiles creation

We now create two projects job0 and job1, with a running profile associated.
all these steps are performed from a local cli with the correct context.

create the runner profiles for job0 and job1
waypoint runner profile set -name job0 -target-runner-id=job0 -plugin-type=docker
waypoint runner profile set -name job1 -target-runner-id=job1 -plugin-type=docker

$ waypoint runner profile list
Runner profiles
NAME  | PLUGIN TYPE |            OCI URL            | TARGET RUNNER | DEFAULT
---------+-------------+-------------------------------+---------------+----------
docker | docker      | hashicorp/waypoint-odr:latest | *             | yes
job0   | docker      | hashicorp/waypoint-odr:latest | job0          |
job1   | docker      | hashicorp/waypoint-odr:latest | job1          |

create the job0 and job1 projects with the running profile

waypoint project apply \
-data-source=git \
-git-auth-type=basic \
-git-username="a@b.c" \
-git-password='secret' \
-git-url=https://gitlab.com/aa/bb/cc.git \
-git-path=job0/ \
-poll \
-app-status-poll \
-runner-profile=job0 \
job0

waypoint project apply \
-data-source=git \
-git-auth-type=basic \
-git-username="a@b.c" \
-git-password='secret' \
-git-url=https://gitlab.com/aa/bb/cc.git \
-git-path=job1/ \
-poll \
-app-status-poll \
-runner-profile=job1 \
job1

we can correctly see the runners profile using the cli, and the projects are shown with the correct settings in the UI (although it seems there is no visual reference to the runner profiles )

test

Now that everything is in place we want to run job0 and job1 from the cli and we expect:

the jobs to be run remotely
job0 to be deployed on R0, using runner with id job0
job1 to be deployed on R1, using runner with id job1

what we get:

the job is assigned to a random runner with profile “docker”
```
Performing operation on "docker" with runner profile "docker"
```
running waypoint up multiple times spawns multiple instances of the app on the runners
running waypoint destroy kills only some apps (the one on the server with the runner that got the job), and it leaves orphan containers

It looks like the project and runner association is not working, and that the app lifeclycle gets messed up with this configuration.

What am I missing?
Am I using it in a way it is not supposed to or I’m just doing something wrong?

extra

While testing different configurations for the runners I think I found a reproducible way to kill the server (similar to issue 3051 maybe?).

setup the env described above
forget runner job1
waypoint runner forget job

crtl-C the job1 runner process (here are the logs, you can spot my ^C at the beginning of the second line)

  2022-05-18T11:52:09.232Z [WARN]  waypoint.runner.agent.runner: server down before accepting a job, will reconnect
  ^C2022-05-18T11:52:20.847Z [INFO]  waypoint.runner.agent: quit request received, gracefully stopping runner
  2022-05-18T11:52:20.847Z [ERROR] waypoint.runner.agent: error running job: err="rpc error: code = Internal desc = early exit while waiting for reconnect"
  2022-05-18T11:52:20.848Z [WARN]  waypoint.runner.agent.runner.config.watcher: exiting due to context ended
  2022-05-18T11:52:20.848Z [WARN]  waypoint.runner.agent.runner.config_recv: EOF or cancellation received, graceful close of runner config stream

the server is dead with the following logs

  2022-05-18T11:52:19.468Z [INFO]  waypoint.server.grpc: /hashicorp.waypoint.Waypoint/GetVersionInfo response: error=<nil> duration=454.272µs
  2022-05-18T11:52:19.514Z [INFO]  waypoint.server.grpc: /hashicorp.waypoint.Waypoint/ListRunners request
  2022-05-18T11:52:19.515Z [INFO]  waypoint.server.grpc: /hashicorp.waypoint.Waypoint/ListRunners response: error=<nil> duration=1.008027ms
  panic: runtime error: invalid memory address or nil pointer dereference
  [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x24d7745]

  goroutine 4747 [running]:
  github.com/hashicorp/waypoint/internal/server/boltdbstate.(*State).runnerOffline(0xc00003e640, 0xc000ba55b0, 0xc000ba55e0, {0xc0017ab48c, 0x4})
          /tmp/wp-src/internal/server/boltdbstate/runner.go:322 +0x85
  github.com/hashicorp/waypoint/internal/server/boltdbstate.(*State).RunnerOffline.func1(0xc000ba5618)
          /tmp/wp-src/internal/server/boltdbstate/runner.go:148 +0x2f
  go.etcd.io/bbolt.(*DB).Update(0xc000666640, 0xc000ba56a8)
          /go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:741 +0x82

The server needs to be reinstalled with the waypoint install command, and it detects a previous installation, but it leaves you with “2 default runner profiles”, not (still?) manageble from the CLI

$ waypoint runner profile list  
Runner profiles  
   NAME  | PLUGIN TYPE |            OCI URL            | TARGET RUNNER | DEFAULT
---------+-------------+-------------------------------+---------------+----------
  docker | docker      | hashicorp/waypoint-odr:latest | *             | yes
  job0   | docker      | hashicorp/waypoint-odr:latest | job0          |
  job1   | docker      | hashicorp/waypoint-odr:latest | job1          |
  docker | docker      | hashicorp/waypoint-odr:latest | *             | yes

izaaklauer · May 20, 2022, 1:49pm

Hi chebelom,

Thank you for the very detailed report! I think you ran into this bug: Unused/misleading `project apply` flag `-runner-profile` · Issue #3275 · hashicorp/waypoint · GitHub. The -runner-profile has been removed in the most recent release.

My sincere apologies here - the real method of profile targeting (in the hcl) has yet to be fully documented. It’s on the short list. In the meantime though, here’s what you want:

project = "job0"

app "web" {
    runner {
        enabled = true
        profile = job0
    }

    build {
        use "docker-pull" {
        image = "nginx"
        tag   = "latest"
        }
    }

    deploy {
        use "docker" {
        }
    }
}

Let me know if that doesn’t work!

Izaak

Topic		Replies	Views
Waypoint Remote Runners configuration question Nomad	3	636	May 27, 2023
How to run build on runners Waypoint	4	386	November 3, 2022
Project initialization hangs in web ui Waypoint	8	681	February 17, 2022
Docker engine context switch (and contribution) Waypoint	5	730	January 12, 2021
There are no applications in this project yet Waypoint	0	318	February 17, 2022