Cannot configure project with specific remote runner

Hello,
we are testing Waypoint to understand if it could be the right tool for us to manage the deployment of some simple docker applications to several remote servers.

goal

We are trying to use remote runners on the onprem remote servers and a single central waypoint server installation (with a public ip), reachable by all the runners.

We managed to have the components running and connected, but I am clearly doing something wrong at the configuration level and I hope someone can point me to the right direction.
I am not able to assign a project to the correct runner.

setup

To test this we are using a two simple waypoint.hcl projects that runs a nginx with no additional configuration, stored in a git repo, in different folders.

I’ll describe here the components, steps, and configurations that we used.
I’m sorry: it’s going to be a long post.

Components involved:

  • WS: the central waypoint server, with no runners

  • R0: remote server 0 with remote runner, should run job0

  • R1: remote server 1 with remote runner, should run job1

  • waypoint.hcl file for job0 (job1 is the same)

    project = "job0"
    
    app "web" {
        runner {
            enabled = true
        }
    
        build {
            use "docker-pull" {
            image = "nginx"
            tag   = "latest"
            }
        }
    
        deploy {
            use "docker" {
            }
        }
    }
    

server installation

We installed Waypoint server on WS following the server installation documentation page and it works fine.
We also applied the following conf to it
waypoint server config-set -advertise-addr=11.22.33.44:9701 -advertise-tls-skip-verify=true

setting and verifying the context on a local CLI also works fine (adding the -server-tls-skip-verify option).

remote runners installation

We installed waypoint and started the runners from the terminal,on R0 and R1, doing the following:

on R0 and R1

  • export WAYPOINT_SERVER_TOKEN=12345654321
  • export WAYPOINT_SERVER_TLS_SKIP_VERIFY=1
  • export WAYPOINT_SERVER_TLS=11.22.33.44:9701

on R0

  • waypoint runner agent -id job0

on R1

  • waypoint runner agent -id job1

Adopt the runners on waypoint server WS:

  • waypoint runner adopt job0
  • waypoint runner adopt job1

Now the runners are runner registered with server and ready and waiting for job assignment

project and runner profiles creation

We now create two projects job0 and job1, with a running profile associated.
all these steps are performed from a local cli with the correct context.

  • create the runner profiles for job0 and job1
    waypoint runner profile set -name job0 -target-runner-id=job0 -plugin-type=docker
    waypoint runner profile set -name job1 -target-runner-id=job1 -plugin-type=docker

    $ waypoint runner profile list
    Runner profiles
    NAME  | PLUGIN TYPE |            OCI URL            | TARGET RUNNER | DEFAULT
    ---------+-------------+-------------------------------+---------------+----------
    docker | docker      | hashicorp/waypoint-odr:latest | *             | yes
    job0   | docker      | hashicorp/waypoint-odr:latest | job0          |
    job1   | docker      | hashicorp/waypoint-odr:latest | job1          |
    
    
  • create the job0 and job1 projects with the running profile

    waypoint project apply \
    -data-source=git \
    -git-auth-type=basic \
    -git-username="a@b.c" \
    -git-password='secret' \
    -git-url=https://gitlab.com/aa/bb/cc.git \
    -git-path=job0/ \
    -poll \
    -app-status-poll \
    -runner-profile=job0 \
    job0
    
    waypoint project apply \
    -data-source=git \
    -git-auth-type=basic \
    -git-username="a@b.c" \
    -git-password='secret' \
    -git-url=https://gitlab.com/aa/bb/cc.git \
    -git-path=job1/ \
    -poll \
    -app-status-poll \
    -runner-profile=job1 \
    job1
    

we can correctly see the runners profile using the cli, and the projects are shown with the correct settings in the UI (although it seems there is no visual reference to the runner profiles )

test

Now that everything is in place we want to run job0 and job1 from the cli and we expect:

  • the jobs to be run remotely
  • job0 to be deployed on R0, using runner with id job0
  • job1 to be deployed on R1, using runner with id job1

what we get:

  • the job is assigned to a random runner with profile “docker”
    Performing operation on "docker" with runner profile "docker"
    
  • running waypoint up multiple times spawns multiple instances of the app on the runners
  • running waypoint destroy kills only some apps (the one on the server with the runner that got the job), and it leaves orphan containers

It looks like the project and runner association is not working, and that the app lifeclycle gets messed up with this configuration.

What am I missing?
Am I using it in a way it is not supposed to or I’m just doing something wrong?

extra

While testing different configurations for the runners I think I found a reproducible way to kill the server (similar to issue 3051 maybe?).

  • setup the env described above
  • forget runner job1
    waypoint runner forget job
  • crtl-C the job1 runner process (here are the logs, you can spot my ^C at the beginning of the second line)
      2022-05-18T11:52:09.232Z [WARN]  waypoint.runner.agent.runner: server down before accepting a job, will reconnect
      ^C2022-05-18T11:52:20.847Z [INFO]  waypoint.runner.agent: quit request received, gracefully stopping runner
      2022-05-18T11:52:20.847Z [ERROR] waypoint.runner.agent: error running job: err="rpc error: code = Internal desc = early exit while waiting for reconnect"
      2022-05-18T11:52:20.848Z [WARN]  waypoint.runner.agent.runner.config.watcher: exiting due to context ended
      2022-05-18T11:52:20.848Z [WARN]  waypoint.runner.agent.runner.config_recv: EOF or cancellation received, graceful close of runner config stream
    
  • the server is dead with the following logs
      2022-05-18T11:52:19.468Z [INFO]  waypoint.server.grpc: /hashicorp.waypoint.Waypoint/GetVersionInfo response: error=<nil> duration=454.272µs
      2022-05-18T11:52:19.514Z [INFO]  waypoint.server.grpc: /hashicorp.waypoint.Waypoint/ListRunners request
      2022-05-18T11:52:19.515Z [INFO]  waypoint.server.grpc: /hashicorp.waypoint.Waypoint/ListRunners response: error=<nil> duration=1.008027ms
      panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x24d7745]
    
      goroutine 4747 [running]:
      github.com/hashicorp/waypoint/internal/server/boltdbstate.(*State).runnerOffline(0xc00003e640, 0xc000ba55b0, 0xc000ba55e0, {0xc0017ab48c, 0x4})
              /tmp/wp-src/internal/server/boltdbstate/runner.go:322 +0x85
      github.com/hashicorp/waypoint/internal/server/boltdbstate.(*State).RunnerOffline.func1(0xc000ba5618)
              /tmp/wp-src/internal/server/boltdbstate/runner.go:148 +0x2f
      go.etcd.io/bbolt.(*DB).Update(0xc000666640, 0xc000ba56a8)
              /go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:741 +0x82
    

The server needs to be reinstalled with the waypoint install command, and it detects a previous installation, but it leaves you with “2 default runner profiles”, not (still?) manageble from the CLI

$ waypoint runner profile list  
Runner profiles  
   NAME  | PLUGIN TYPE |            OCI URL            | TARGET RUNNER | DEFAULT
---------+-------------+-------------------------------+---------------+----------
  docker | docker      | hashicorp/waypoint-odr:latest | *             | yes
  job0   | docker      | hashicorp/waypoint-odr:latest | job0          |
  job1   | docker      | hashicorp/waypoint-odr:latest | job1          |
  docker | docker      | hashicorp/waypoint-odr:latest | *             | yes

Hi chebelom,

Thank you for the very detailed report! I think you ran into this bug: Unused/misleading `project apply` flag `-runner-profile` · Issue #3275 · hashicorp/waypoint · GitHub. The -runner-profile has been removed in the most recent release.

My sincere apologies here - the real method of profile targeting (in the hcl) has yet to be fully documented. It’s on the short list. In the meantime though, here’s what you want:

project = "job0"

app "web" {
    runner {
        enabled = true
        profile = job0
    }

    build {
        use "docker-pull" {
        image = "nginx"
        tag   = "latest"
        }
    }

    deploy {
        use "docker" {
        }
    }
}

Let me know if that doesn’t work!

Izaak

1 Like