Waypoint deploy exits with non-descriptive error - ! error reading from server: EOF

Hi,

I’m having an issue deploying nomad jobs that have the scheduler type ‘batch’. We’re using nomad periodic jobs to run scheduled tasks and have successfully deployed this to our QA and staging environments about 2 months ago. We then tried deploying to our production environment about 1 month ago and get the error mentioned ! error reading from server: EOF. No other message is printed so it’s been hard tracking down the issue.

I’ve tried simplifying both the waypoint.hcl and the nomad config, but it still fails. Not sure if its useful to note that we can deploy directly to nomad; this only fails with waypoint up or waypoint deploy.

Let me know if I can provide more information. I didnt include either the waypoint.hcl or nomad spec since even a barebones example file doesnt work. I’m looking for a solution or a nudge in the right direction since the error printed doesn’t give me much (I’ve tried -vvv too)

2 Likes

Check in the Nomad client logs where the the Waypoint runner (and task jobs) is running that there is no OOM killed. That was my case (also had the ! error reading from server: EOF) , so it was a memory issue in the runner. The solution was to increase memory for the runner job.

You just can install with following parameters:

waypoint server install -platform nomad \
-nomad-host "http://$(hostname -i):4646" \
-nomad-host-volume waypoint \
-nomad-runner-host-volume waypoint-runner \
-nomad-runner-memory 800 \
-accept-tos \
-vvv

In my case it worked to set runner memory to 800MBi (default is 600). But I suppose it will depend on the builds your runner will do.

Probably you can also configure a runner profile, using the parameter -plugin-config with the following configuration, so you don’t need to reinstall the runner:

{
        "datacenter": "dc1",
        "namespace": "default",
        "nomad_host": "$NOMAD_ADDR",
        "region": "global",
        "resources_cpu": "200",
        "resources_memory": "800"
}

I know this is an old thread but I was having the same error and issue during the last week. This solved it in my case.

Hope it helps.

3 Likes

@dcanadillas1, if you encounter this again, mind adding the output from the waypoint job get-stream <id> - here are some docs on it? This is definitely a hard to catch issue with memory on the runners. Thanks for posting your solution! Here is some additional documentation on the watchJob.