HCP Waypoint: Unable to install Runner on Nomad Client

Dear all,

I’m new to this, please excuse any obvious questions.

I’m trying to setup HCP Waypoint and am following the steps (Create Project, GitHub, Connect Remotely). I successfully installed Waypoint on the EC2 instance that is running Nomad Client and created the context.

Trying to install the runner:
waypoint runner install -platform=nomad -server-addr=api.hashicorp.cloud:443 -nomad-runner-image=hashicorp/waypoint -nomad-host-volume=wp-runner-vol
! Error installing runner into “”: unsupported platform

The instance is able to connect to api.hashicorp.cloud:443 and the nomad-host-volume is configured in Nomad Client, the directory exists.

The error message seems to indicate that something is wrong with the -platform option but I checked the documentation and it’s straightforwardly nomad. I tried this on my local machine as well, exactly the same experience.

Greatly appreciate any hints how I can make this work.

Cheers
Philipp

Hey @nasenblick , thanks for reaching out! Would you mind trying to re-run this command with the flag -vv and sharing the logs? Also, what version of Waypoint are you using?

Hi @paladin-devops,

Thanks for the swift reply. Here you go:

waypoint runner install   -platform=nomad   -server-addr=api.hashicorp.cloud:443   -nomad-runner-image=hashicorp/waypoint   -nomad-host-volume=wp-runner-vol -vv
2023-06-08T20:49:57.516Z [INFO]  waypoint: **waypoint version**: full_string="**v0.11.1** (1822efe)" version=v0.11.1 prerelease="" metadata="" revision=1822efe
2023-06-08T20:49:57.516Z [DEBUG] waypoint: home configuration directory: path=/home/ubuntu/.config/waypoint
2023-06-08T20:49:57.517Z [INFO]  waypoint.server: attempting to source credentials and connect
2023-06-08T20:49:57.759Z [INFO]  waypoint.serverclient: utilizing credentials fetched via oauth for server auth: oauth-url=https://auth.hashicorp.com/oauth/token oauth-client-id=nXpDZx9SwAobsfkzXA0xrtikqa1n1JJT
2023-06-08T20:49:57.759Z [DEBUG] waypoint.serverclient: connection information: address=api.hashicorp.cloud:443 tls=true tls_skip_verify=false send_auth=true has_token=true
2023-06-08T20:49:57.766Z [DEBUG] waypoint.server: connection established with sourced credentials
2023-06-08T20:49:57.784Z [INFO]  **waypoint: server version** info: version="**hcp v0.12.0**" api_min=1 api_current=1 entrypoint_min=1 entrypoint_current=1
2023-06-08T20:49:57.784Z [INFO]  waypoint: negotiated api version: version=1
! Error installing runner into "": unsupported platform

@paladin-devops

Thanks for looking into this. FWIW: I didn’t create the storage directory and configuration via terraform apply but added them manually and reloaded Nomad afterwards. Please let me know if this is an issue and I’ll destroy and recreate.

@nasenblick this issue was recently fixed in this PR! This fix will be released on our next minor release, v0.11.2. Until then, I recommend using Waypoint v0.11.0 to perform the Nomad runner installation.

Thank you for reporting this and trying out HCP Waypoint!

Thanks @paladin-devops for the guidance. Trying 0.11.0 I get a bit further but then…

waypoint runner install   -platform=nomad   -server-addr=api.hashicorp.cloud:443   -nomad-runner-image=hashicorp/waypoint   -nomad-host-volume=wp-runner-vol -vv
2023-06-09T09:19:43.977Z [INFO]  waypoint: waypoint version: full_string="v0.11.0 (e92d6fbe0)" version=v0.11.0 prerelease="" metadata="" revision=e92d6fbe0
2023-06-09T09:19:43.978Z [DEBUG] waypoint: home configuration directory: path=/home/ubuntu/.config/waypoint
2023-06-09T09:19:43.978Z [INFO]  waypoint.server: attempting to source credentials and connect
2023-06-09T09:19:44.242Z [INFO]  waypoint.serverclient: utilizing credentials fetched via oauth for server auth: oauth-url=https://auth.hashicorp.com/oauth/token oauth-client-id=nXpDZx9SwAobsfkzXA0xrtikqa1n1JJT
2023-06-09T09:19:44.242Z [DEBUG] waypoint.serverclient: connection information: address=api.hashicorp.cloud:443 tls=true tls_skip_verify=false send_auth=true has_token=true
2023-06-09T09:19:44.251Z [DEBUG] waypoint.server: connection established with sourced credentials
2023-06-09T09:19:44.271Z [INFO]  waypoint: server version info: version="hcp v0.12.0" api_min=1 api_current=1 entrypoint_min=1 entrypoint_current=1
2023-06-09T09:19:44.271Z [INFO]  waypoint: negotiated api version: version=1
✓ Finished connecting to: api.hashicorp.cloud:443
❌ Installing runner...
✓ Initializing Nomad client...
❌ Nomad allocation created
! Error installing runner: no allocations found after evaluation completed
Please run the following to clean up the resources from the unsuccessful runner installation,
specifying additional platform flags as needed:

waypoint runner uninstall -platform=nomad -id=01H2FQ8R556B6H6VWSH5TCCGZY

I checked both Nomad server and client logs but didn’t find any errors.

nomad status:

ID                                          Type     Priority  Status   Submit Date
waypoint-01H2FQYNE0DDWS2RFVAVQAA2PP-runner  service  50        pending  2023-06-09T09:31:42Z

Is this issue related?

@nasenblick since the allocation is still in a pending state, there must be some reason that it isn’t coming online. Could you try running nomad job status <your_runner_job_id> -verbose to see if there’s more information available for troubleshooting?

That doesn’t seem to be possible:

nomad job status 01H2GRHJC0KY8KBCHPW0ZS5T37 -verbose
This command takes either no arguments or one: <job>

But here’s some more information from the UI:

waypoint-runner 1 unplaced
Resources exhausted on 1 node
Dimension memory exhausted on 1 node

Reserved CPU: 400MHz
Reserved Memory: 1,200 MiB
Reserved MiB: 300 MiB

ID                    Priority  Created                   Triggered  By  Status            Placement Failures
19d85f46	50	Jun 09 21:01:16 +0200	job-register	complete	True
cdbbcf8d	50	Jun 09 21:01:16 +0200	queued-allocs	blocked	N/A - In Progress

And some more:

{
  "Stop": false,
  "Region": "global",
  "Namespace": "default",
  "ID": "waypoint-01H2GRHJC0KY8KBCHPW0ZS5T37-runner",
  "ParentID": "",
  "Name": "waypoint-01H2GRHJC0KY8KBCHPW0ZS5T37-runner",
  "Type": "service",
  "Priority": 50,
  "AllAtOnce": false,
  "Datacenters": [
    "dc1"
  ],
  "Constraints": null,
  "Affinities": null,
  "Spreads": null,
  "TaskGroups": [
    {
      "Name": "waypoint-runner",
      "Count": 1,
      "Update": {
        "Stagger": 30000000000,
        "MaxParallel": 1,
        "HealthCheck": "checks",
        "MinHealthyTime": 10000000000,
        "HealthyDeadline": 300000000000,
        "ProgressDeadline": 600000000000,
        "AutoRevert": false,
        "AutoPromote": false,
        "Canary": 0
      },
      "Migrate": {
        "MaxParallel": 1,
        "HealthCheck": "checks",
        "MinHealthyTime": 10000000000,
        "HealthyDeadline": 300000000000
      },
      "Constraints": null,
      "Scaling": null,
      "RestartPolicy": {
        "Attempts": 2,
        "Interval": 1800000000000,
        "Delay": 15000000000,
        "Mode": "fail"
      },
      "Tasks": [
        {
          "Name": "pre_task",
          "Driver": "docker",
          "User": "",
          "Config": {
            "command": "sh",
            "image": "busybox:latest",
            "args": [
              "-c",
              "chown -R 100:1000 /data/"
            ]
          },
          "Env": null,
          "Services": null,
          "Vault": null,
          "Templates": null,
          "Constraints": null,
          "Affinities": null,
          "Resources": {
            "CPU": 200,
            "Cores": 0,
            "MemoryMB": 600,
            "MemoryMaxMB": 0,
            "DiskMB": 0,
            "IOPS": 0,
            "Networks": null,
            "Devices": null
          },
          "RestartPolicy": {
            "Attempts": 2,
            "Interval": 1800000000000,
            "Delay": 15000000000,
            "Mode": "fail"
          },
          "DispatchPayload": null,
          "Lifecycle": {
            "Hook": "prestart",
            "Sidecar": false
          },
          "Meta": null,
          "KillTimeout": 5000000000,
          "LogConfig": {
            "MaxFiles": 10,
            "MaxFileSizeMB": 10
          },
          "Artifacts": null,
          "Leader": false,
          "ShutdownDelay": 0,
          "VolumeMounts": [
            {
              "Volume": "waypoint-runner",
              "Destination": "/data",
              "ReadOnly": false,
              "PropagationMode": "private"
            }
          ],
          "ScalingPolicies": null,
          "KillSignal": "",
          "Kind": "",
          "CSIPluginConfig": null,
          "Identity": null
        },
        {
          "Name": "runner",
          "Driver": "docker",
          "User": "",
          "Config": {
            "image": "hashicorp/waypoint",
            "args": [
              "runner",
              "agent",
              "-id=01H2GRHJC0KY8KBCHPW0ZS5T37",
              "-state-dir=/data/runner",
              "-cookie=9a3c87ec-55c3-44b5-addc-f63320e3bc95",
              "-vv"
            ],
            "auth_soft_fail": false
          },
          "Env": {
            "WAYPOINT_SERVER_TLS": "true",
            "WAYPOINT_SERVER_TLS_SKIP_VERIFY": "false",
            "NOMAD_ADDR": "http://localhost:4646",
            "WAYPOINT_SERVER_ADDR": "api.hashicorp.cloud:443"
          },
          "Services": null,
          "Vault": null,
          "Templates": null,
          "Constraints": null,
          "Affinities": null,
          "Resources": {
            "CPU": 200,
            "Cores": 0,
            "MemoryMB": 600,
            "MemoryMaxMB": 0,
            "DiskMB": 0,
            "IOPS": 0,
            "Networks": null,
            "Devices": null
          },
          "RestartPolicy": {
            "Attempts": 2,
            "Interval": 1800000000000,
            "Delay": 15000000000,
            "Mode": "fail"
          },
          "DispatchPayload": null,
          "Lifecycle": null,
          "Meta": null,
          "KillTimeout": 5000000000,
          "LogConfig": {
            "MaxFiles": 10,
            "MaxFileSizeMB": 10
          },
          "Artifacts": null,
          "Leader": false,
          "ShutdownDelay": 0,
          "VolumeMounts": [
            {
              "Volume": "waypoint-runner",
              "Destination": "/data",
              "ReadOnly": false,
              "PropagationMode": "private"
            }
          ],
          "ScalingPolicies": null,
          "KillSignal": "",
          "Kind": "",
          "CSIPluginConfig": null,
          "Identity": null
        }
      ],
      "EphemeralDisk": {
        "Sticky": false,
        "SizeMB": 300,
        "Migrate": false
      },
      "Meta": null,
      "ReschedulePolicy": {
        "Attempts": 0,
        "Interval": 0,
        "Delay": 30000000000,
        "DelayFunction": "exponential",
        "MaxDelay": 3600000000000,
        "Unlimited": true
      },
      "Affinities": null,
      "Spreads": null,
      "Networks": [
        {
          "Mode": "host",
          "Device": "",
          "CIDR": "",
          "IP": "",
          "Hostname": "",
          "MBits": 0,
          "DNS": null,
          "ReservedPorts": null,
          "DynamicPorts": null
        }
      ],
      "Consul": {
        "Namespace": ""
      },
      "Services": null,
      "Volumes": {
        "waypoint-runner": {
          "Name": "",
          "Type": "host",
          "Source": "wp-runner-vol",
          "ReadOnly": false,
          "AccessMode": "",
          "AttachmentMode": "",
          "MountOptions": null,
          "PerAlloc": false
        }
      },
      "ShutdownDelay": null,
      "StopAfterClientDisconnect": null,
      "MaxClientDisconnect": null
    }
  ],
  "Update": {
    "Stagger": 30000000000,
    "MaxParallel": 1,
    "HealthCheck": "",
    "MinHealthyTime": 0,
    "HealthyDeadline": 0,
    "ProgressDeadline": 0,
    "AutoRevert": false,
    "AutoPromote": false,
    "Canary": 0
  },
  "Multiregion": null,
  "Periodic": null,
  "ParameterizedJob": null,
  "Dispatched": false,
  "DispatchIdempotencyToken": "",
  "Payload": null,
  "Meta": null,
  "ConsulToken": "",
  "ConsulNamespace": "",
  "VaultToken": "",
  "VaultNamespace": "",
  "NomadTokenID": "a2b9aaf5-2f1f-98d2-6a26-2da0b0174ab0",
  "Status": "pending",
  "StatusDescription": "",
  "Stable": false,
  "Version": 0,
  "SubmitTime": 1686337276497813000,
  "CreateIndex": 238,
  "ModifyIndex": 238,
  "JobModifyIndex": 238
}

Ah, I think that command you ran nomad job status 01H2GRHJC0KY8KBCHPW0ZS5T37 -verbose did not work because that job ID is incorrect. The actual Nomad job ID is waypoint-01H2GRHJC0KY8KBCHPW0ZS5T37-runner, in which case you’d need to run:

nomad job status waypoint-01H2GRHJC0KY8KBCHPW0ZS5T37-runner -verbose`

That said, I think that the issue here is that the Waypoint runner Nomad allocation can’t be placed due to a memory constraint. It is seeking 1200 MiB; however, your Nomad client where you have the host volume set up seems to be low on memory resources, which I’m thinking is the case based on this from your last message: Dimension memory exhausted on 1 node.

I think the solution here may be to configure the host volume on a different client which has more resources, increase the amount of resources available to this Nomad client (would require updating the host to include more memory), or free up resources on this Nomad client.

Additionally, we currently do not have a flag available on waypoint runner install for Nomad runners to customize the amount of CPU and memory resources. Having this flag would enable you to lower the amount of resources used for the runner. This should be available, will get this fixed!

Yep, that was the problem. I was using AWS t3.nano instances. With t2.micro everything works as expected. Thanks for the solution and sorry for the runner id.

1 Like

No problem, I’m glad you got this resolved! In our next release, we will also have the CLI flags available to configure the CPU and memory of the runner using waypoint runner install, so it can be decreased if needed.

Thanks for trying Waypoint!

1 Like

Me again. Apparently, fixing the instance size worked but only for managing nomad jobs from the local CLI with a tunnel to the server. I still cannot install Waypoint runners on my Nomad Cluster:

$ sudo waypoint runner install   -platform=nomad   -server-addr=api.hashicorp.cloud:443   -nomad-runner-image=hashicorp/waypoint   -nomad-host-volume=wp-runner-vol
✓ Finished connecting to: api.hashicorp.cloud:443
❌ Installing runner...
✓ Initializing Nomad client...
❌ Installing the Waypoint runner
! Error installing runner: Unexpected response code: 403 (Permission denied)
Please run the following to clean up the resources from the unsuccessful runner installation,
specifying additional platform flags as needed:

waypoint runner uninstall -platform=nomad -id=01H2X5JRQY1616NNTNC38GYM52 <additional_platform_flags>

I have setup Nomad for evaluation purposes, e.g. installed and bootstrapped only, without ACL. I tried setting the nomad token as env var or append to the above command without luck. It currently runs as root. Waypoint I installed as zip, version 0.11.0.

I’d like to include Waypoint in the demo/ evaluation I’m doing, happy to use an older version but not sure if it works with my setup?

I was on the wrong machine (Nomad Server), works as intended on the client machine.

waypoint runner install \
>   -platform=nomad \
>   -server-addr=api.hashicorp.cloud:443 \
>   -nomad-runner-image=hashicorp/waypoint \
>   -nomad-host-volume=wp-runner-vol
✓ Finished connecting to: api.hashicorp.cloud:443
✓ Runner "01H2X6R08QH9YJX701QCQ9NWV3" installed successfully to nomad
✓ Runner profile "nomad-01H2X6R08QH9YJX701QCQ9NWV3" created successfully.
✓ Initializing Nomad client...
✓ Waypoint runner installed
✓ Runner "01H2X6R08QH9YJX701QCQ9NWV3" adopted successfully.
1 Like

That’s great news! I’m glad you figured it out :smiley: In the future, so you don’t need to be on the machine where Nomad is running, you can set the env var NOMAD_ADDR, and waypoint runner install will use that address when submitting the job to Nomad.

Thanks, that’s good to know. Besides installing the runner I managed to deploy my Nomad cluster and I can see the app & deployment in the HCP Waypoint UI. However, it’s available on a private IP only after running:

nomad alloc status \
  $(nomad job status waypoint-app-01h2z86h95smzfs7nwhgw9dwxx | \
  grep -i allocation -A 10 | \
  grep -i running | \
  awk '{print $1}') | 
  grep -i waypoint | awk '{print "http://"$3}' 

I’d like to make it available on the cluster’s public IP but I haven’t had any luck so far. Here’s what I tried:

  1. Add release stanza to my waypoint.hcl
variable "registry_username" {
  type = string
  default = ""
  env = ["REGISTRY_USERNAME"]
}

variable "registry_password" {
  type = string
  sensitive = true
  default = ""
  env = ["REGISTRY_PASSWORD"]
}

project = "nomad-nodejs"

app "nomad-nodejs-web" {
  build {
    use "pack" {}
    registry {
      use "docker" {
        image = "${var.registry_username}/nomad-nodejs-web"
        tag   = "1"
        local = false
        auth {
          username = var.registry_username
          password = var.registry_password
        }
      }
    }
  }

  deploy {
    use "nomad" {
      datacenter = "dc1"
      namespace  = "default"
      service_provider = "nomad"
    }
  }
}

release {
  use "nomad" {
    service {
      name = "nomad-nodejs-web"
      port = 80
    }

    ingress {
      use "http" {
        route {
          path     = "/"
          hostname = "nasenblick.com"
        }
      }
    }
  }
}

I tried several options but keep getting errors for waypoint init

  1. Setting a deployment URL:
server {
  url {
    enabled = true
  }

This doesn’t seem to work and I’m not sure if that the purpose of the url stanza.

I’m currently running a few other jobs from the Nomad tutorial (PytechCo Simulator) on the cluster and can access it on port 5000.

Greatly appreciate your feedback.

Cheers
Philipp

I’m glad that you got your app deployed! I can explain a few things which you posted about.

The waypoint.hcl does indeed have the release stanza, which you have added here; however, there are no release plugins which support the nomad plugin. There is the nomad-jobspec-canary release plugin, but that is compatible only with the nomad-jobspec plugin, while you’re using the nomad plugin.

To identify plugins which are compatible with each other, please check the Interface section of a given plugin component’s documentation. There you will see what the plugin component’s inputs and outputs are, if applicable. For example, the nomad-jobspec-canary plugin component docs indicate that a jobspec.Deployment is required for input, meaning that the nomad-jobspec deployment plugin is an input.

Regarding the url stanza, the setting there indicates to Waypoint whether or not you want a URL to be generated by the Horizon URL service for your application. However, currently this is not supported by HCP Waypoint, and applies only for Waypoint OSS. It’s a way to opt-out of getting a deployment URL for your app, if your OSS Waypoint server has that setting enabled, which is the default.

To open up your application to external ingress then, you could use Consul for this. The Nomad plugin registers a service by default. You can set it explicitly to Consul using the service_provider parameter. However, that only registers your application as a service which uses service discovery. The Nomad plugin currently doesn’t set any configurations to register a service with Consul service mesh, where you might use an ingress gateway to access the service that way. You may wish to consider writing a Nomad job specification for that, and using the nomad-jobspec plugin, or open a feature request on the hashicorp/waypoint repository for such an improvement.

Here are some additional helpful resources on Nomad ingress, as well as some for Consul, should you choose to use it! :smile:

Thanks much for the detailed explanations and pointing me into the right directions. It seems there’s quite a lot more to understand to making this work! :+1:

I had to recreate my environment and am facing a different challenge when deploying with Waypoint. After creating the context and installing on a Nomad client node, waypoint init works fine but waypoint up gives me the following message:

>> Operation is queued waiting for job "01H3APN9N4F9MPA7962MTAECW4". Waiting for runner assignment...

On Nomad I can see the waypoint-runner with status “running” but nothing seems to happen.

waypoint.hcl:

variable "registry_username" {
  type = string
  default = "nasenblick"
  env = ["REGISTRY_USERNAME"]
}

variable "registry_password" {
  type = string
  sensitive = true
  default = "...."
  env = ["REGISTRY_PASSWORD"]
}

project = "waypoint-test"

app "waypoint-app" {
  build {
    use "pack" {}
    registry {
      use "docker" {
        image = "${var.registry_username}/nomad-nodejs-web"
        tag   = "1"
        local = false
        auth {
          username = var.registry_username
          password = var.registry_password
        }
      }
    }
  }

  deploy {
    use "nomad" {
      datacenter = "dc1"
      namespace  = "default"
      service_provider = "nomad"
    }
  }
}

I’m using the waypoint-example app from this tutorial. I don’t see any jobs besides the runner in the Nomad UI. So I’m wondering what potential root causes might be. Appreciate your help.

The job 01H3APN9N4F9MPA7962MTAECW4 is actually referring to a Waypoint job that is attempting to execute the “up” operation you triggered with waypoint up. You can view details about this job using waypoint job inspect 01H3APN9N4F9MPA7962MTAECW4. You can view its output using waypoint job get-stream 01H3APN9N4F9MPA7962MTAECW4, however that won’t show anything since you mentioned the job is still queued. There’s more information on jobs here!

I’d recommend checking the logs in Nomad for the Waypoint runner. It may need to be restarted if the job stream was interrupted. Also, I recommend running waypoint runner profile list, and verify that the runner profile being used for your Waypoint job is configured to target the runner you have running in Nomad. We have documentation on runner targeting here!