Waypoint ui fails with "! input/output error"

Hello,

I am getting ! input/output error when running waypoint ui -authenticate, see full trace log below (I have removed the public IP from the logs). Up until today the command was working properly and had accessed the UI with success.

Waypoint v0.6.3, installed on a v1.22 K8s cluster.

2022-01-09T00:47:14.479+0200 [INFO]  waypoint: waypoint version: full_string="v0.6.3 (bd303e12)" version=v0.6.3 prerelease="" metadata="" revision=bd303e12
2022-01-09T00:47:14.479+0200 [TRACE] waypoint: starting interrupt listener for context cancellation
2022-01-09T00:47:14.479+0200 [TRACE] waypoint: interrupt listener goroutine started
2022-01-09T00:47:14.479+0200 [DEBUG] waypoint: home configuration directory: path=/home/havoc24k/.config/waypoint
2022-01-09T00:47:14.479+0200 [TRACE] waypoint: no API client provided, initializing connection if possible
2022-01-09T00:47:14.479+0200 [TRACE] waypoint.server: WithLocal set, server credentials optional
2022-01-09T00:47:14.479+0200 [INFO]  waypoint.server: attempting to source credentials and connect
2022-01-09T00:47:14.479+0200 [DEBUG] waypoint.serverclient: connection information: address=REDACTED:9701 tls=true tls_skip_verify=true send_auth=true has_token=true
2022-01-09T00:47:14.986+0200 [DEBUG] waypoint.server: connection established with sourced credentials
2022-01-09T00:47:14.986+0200 [TRACE] waypoint: requesting version info from server
2022-01-09T00:47:15.156+0200 [INFO]  waypoint: server version info: version=v0.6.3 api_min=1 api_current=1 entrypoint_min=1 entrypoint_current=1
2022-01-09T00:47:15.157+0200 [INFO]  waypoint: negotiated api version: version=1
2022-01-09T00:47:15.157+0200 [DEBUG] waypoint: starting runner to process local jobs
2022-01-09T00:47:15.157+0200 [DEBUG] waypoint.runner: Created runner: id=01FRXXZRENKE613CFPXAF7JGHD
2022-01-09T00:47:15.157+0200 [DEBUG] waypoint.runner: registering runner
2022-01-09T00:47:15.157+0200 [DEBUG] waypoint.runner: runner registered, waiting for first config processing
2022-01-09T00:47:15.321+0200 [INFO]  waypoint.runner.config_recv: new configuration received
2022-01-09T00:47:15.321+0200 [TRACE] waypoint.runner.watch_config.watchloop: got source config update but ignoring since there is no diff
2022-01-09T00:47:15.321+0200 [DEBUG] waypoint.runner.watch_config.watchloop: new config variables received, scheduling refresh
2022-01-09T00:47:15.321+0200 [TRACE] waypoint.runner.watch_config.watchloop: calculating changes between old and new config
2022-01-09T00:47:15.822+0200 [TRACE] waypoint.runner.watch_config.watchloop: refreshing app configuration
2022-01-09T00:47:15.822+0200 [DEBUG] waypoint.runner.watch_config.watchloop: new configuration computed
2022-01-09T00:47:15.822+0200 [TRACE] waypoint.runner.watch_config: received new app config
2022-01-09T00:47:15.822+0200 [INFO]  waypoint.runner: runner registered with server and ready
2022-01-09T00:47:15.822+0200 [DEBUG] waypoint.runner: opening job stream
2022-01-09T00:47:15.822+0200 [TRACE] waypoint.runner: sending job request
2022-01-09T00:47:15.823+0200 [INFO]  waypoint.runner: waiting for job assignment

» Creating invite token
This invite token will be exchanged for an authentication 
token that your browser stores.2022-01-09T00:47:15.983+0200 [WARN]  waypoint.runner.watch_config: exiting due to context ended
2022-01-09T00:47:15.983+0200 [TRACE] waypoint.runner.watch_config: exiting goroutine
2022-01-09T00:47:15.983+0200 [TRACE] waypoint.runner.config_recv: exiting receive goroutine

! input/output error
2022-01-09T00:47:15.983+0200 [TRACE] waypoint: stopping signal listeners and cancelling the context

Thank you.

I tried the following and still getting the same error

  1. Removed waypoint and reinstalled through the apt package.
  2. Downloaded the .zip with the executable from the site.

Still getting the same issue.

I also downloaded the source code and tried to run the tests. The test below fails. I don’t know if there is any relation.

[INFO] freeport: detected ephemeral port range of [32768, 60999]
[INFO] freeport: reducing max blocks from 30 to 15 to avoid the ephemeral port range
[WARN] freeport: leaked port 26502 due to theft; removing from circulation
[INFO] freeport: resetting the freeport package state
--- FAIL: TestTakeReturn (1.44s)
    freeport_test.go:211: err: listen tcp 127.0.0.1:27527: socket: too many open files

I resorted in deleting ~/.config/waypoint and it fixed the issue.

(At this point this is more like a debugging journal, If I am misusing/abusing the platform please inform me.)

The issue appears again, this time when I run Waypoint through Gitlab CI/CD.

waypoint:
  image: docker:latest
  only:
    - main
  stage: build
  services:
    - docker:dind
  
  variables:
    WAYPOINT_VERSION: "0.6.3"
    WAYPOINT_SERVER_ADDR: "[DEPRECATED]:9701"
    WAYPOINT_SERVER_TOKEN: "[DEPRECATED]"
    WAYPOINT_SERVER_TLS: "1"
    WAYPOINT_SERVER_TLS_SKIP_VERIFY: "1"
  script:
    - wget -q -O /tmp/waypoint.zip https://releases.hashicorp.com/waypoint/${WAYPOINT_VERSION}/waypoint_${WAYPOINT_VERSION}_linux_amd64.zip
    - unzip -d /usr/local/bin /tmp/waypoint.zip
    - rm -rf /tmp/waypoint*
    - cd spring/
    - waypoint init
    - waypoint build
    - waypoint deploy
    - waypoint release

Could this be somehow related to the Waypoint server? All proper ports are open and waypoint context verify returns a success.

2022-01-10T13:43:57.346+0200 [INFO]  waypoint: waypoint version: full_string="v0.6.3 (bd303e12)" version=v0.6.3 prerelease="" metadata="" revision=bd303e12
2022-01-10T13:43:57.346+0200 [TRACE] waypoint: starting interrupt listener for context cancellation
2022-01-10T13:43:57.347+0200 [TRACE] waypoint: interrupt listener goroutine started
2022-01-10T13:43:57.347+0200 [DEBUG] waypoint: home configuration directory: path=/home/havoc24k/.config/waypoint
⠸ Connecting with context "waypoint-v2"...2022-01-10T13:43:57.841+0200 [TRACE] waypoint: requesting version info from server
⠼ Verifying connection is valid for context "waypoint-v2"...2022-01-10T13:43:57.993+0200 [INFO]  waypoint: server version info: version=v0.6.3 api_min=1 api_current=1 entrypoint_min=1 entrypoint_current=1
2022-01-10T13:43:57.993+0200 [INFO]  waypoint: negotiated api version: version=1
✓ Context "waypoint-v2" connected successfully.
2022-01-10T13:43:57.993+0200 [TRACE] waypoint: stopping signal listeners and cancelling the context

Hello,

I ends out that it was server related after all. I restarted the K8s control plane service and ingress and waypoint is working fine. It seems that the actions that transmit a lot of tracing/logs back to the waypoint service were causing the nginx ingress to fail in some cases and i guess that’s what ! input/output error was.

If someone from the team can verify that in case the logging service does not get a response back the process fails it would be great.

Thank you for all of this troubleshooting! Can you provide a few more details to help me recreate this?
It sounds like you’ve got a k8s cluster with nginx ingress set up. Any other configurations here?
When you mention “the actions that transmit a lot of tracing/logs back to the waypoint service” what specifically were you doing to repeat the behavior of seeing the error message? Per what you shared from the GitLab CI attempt, it looks like just running build/deploy/release repeats this error?
Where is your k8s cluster running? AWS?
Thanks!

1 Like

Hello @krantzinator

Well I can try and see if I can replicate the issue but from my current understanding the issue was not Waypoint CLI related.

The K8s cluster is running on a Jelastic PaaS provider https://jelastic.com/paas-cloud-hosting. Version 1.22.

See service breakdown below

NAMESPACE               NAME                                                    TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                                  AGE
cert-manager            cert-manager                                            ClusterIP      10.244.111.73    <none>           9402/TCP                                 13d
cert-manager            cert-manager-nginx-ingress-nginx-controller             LoadBalancer   10.244.101.183   [DEPRECATED]     80:31466/TCP,443:30460/TCP               13d
cert-manager            cert-manager-nginx-ingress-nginx-controller-admission   ClusterIP      10.244.99.143    <none>           443/TCP                                  13d
cert-manager            cert-manager-webhook                                    ClusterIP      10.244.45.53     <none>           443/TCP                                  13d
default                 kubernetes                                              ClusterIP      10.244.0.1       <none>           443/TCP                                  13d
ingress-nginx           ingress-nginx-controller                                ClusterIP      10.244.126.252   <none>           80/TCP,443/TCP                           13d
ingress-nginx           ingress-nginx-controller-admission                      ClusterIP      10.244.125.161   <none>           443/TCP                                  13d
kube-system             kube-dns                                                ClusterIP      10.244.0.10      <none>           53/UDP,53/TCP,9153/TCP                   13d
kube-system             metrics-server                                          ClusterIP      10.244.121.71    <none>           443/TCP                                  13d
kubernetes-monitoring   monitoring-grafana                                      ClusterIP      10.244.188.142   <none>           80/TCP                                   13d
kubernetes-monitoring   monitoring-prometheus-alertmanager                      ClusterIP      10.244.177.42    <none>           80/TCP                                   13d
kubernetes-monitoring   monitoring-prometheus-kube-state-metrics                ClusterIP      10.244.159.134   <none>           8080/TCP                                 13d
kubernetes-monitoring   monitoring-prometheus-node-exporter                     ClusterIP      None             <none>           9100/TCP                                 13d
kubernetes-monitoring   monitoring-prometheus-pushgateway                       ClusterIP      10.244.43.31     <none>           9091/TCP                                 13d
kubernetes-monitoring   monitoring-prometheus-server                            ClusterIP      10.244.61.8      <none>           80/TCP                                   13d
kubernetes-skooner      kubernetes-skooner                                      NodePort       10.244.216.162   <none>           80:30777/TCP                             13d
observability           jaeger-agent                                            ClusterIP      None             <none>           5775/UDP,5778/TCP,6831/UDP,6832/UDP      13d
observability           jaeger-collector                                        ClusterIP      10.244.100.20    <none>           9411/TCP,14250/TCP,14267/TCP,14268/TCP   13d
observability           jaeger-collector-headless                               ClusterIP      None             <none>           9411/TCP,14250/TCP,14267/TCP,14268/TCP   13d
observability           jaeger-operator-metrics                                 ClusterIP      10.244.115.86    <none>           8383/TCP,8686/TCP                        13d
observability           jaeger-query                                            ClusterIP      10.244.143.198   <none>           16686/TCP,16685/TCP                      13d
waypoint                waypoint                                                LoadBalancer   10.244.200.157   [DEPRECATED]     9701:31377/TCP,9702:31242/TCP            33h

My comment about “the actions that transmit a lot of tracing/logs back to the waypoint service” refer to the fact that from what I saw on the trace logs Waypoint is streaming log information constantly to the grpc Waypoint interface, which makes total sense since the Web UI is being showing the same logs.

My assumption is that when my k8s cluster was in heavy strain, with high IOPS and resource utilization the grpc streaming might be have been blocked/delayed and that caused the Waypoint CLI to stop the process and showing the ! input/output error.

I haven’t been able to find time yet to inspect the Waypoint source code to see if I can find something.

Per what you shared from the GitLab CI attempt, it looks like just running build/deploy/release repeats this error

This is another interesting point there. Not all commands produced this error. For instance waypoint init was not producing any error.

$ waypoint init
-> Validating configuration file...
-> Configuration file appears valid
-> Validating server credentials...
-> Connection to Waypoint server was successful
-> Checking if project "[DEPRECATED]" is registered...
-> Project "[DEPRECATED]" and all apps are registered with the server.
Project initialized!
You may now call 'waypoint up' to deploy your project or
commands such as 'waypoint build' to perform steps individually.

Whereas waypoint build was failing immediately after

$ waypoint build
» Building spring-boot-api...
! input/output error

Maybe the Waypoint CLI is more strict in terms of log streaming for some commands than others.

I will be more than happy to provide any other information you might need.

Thank you.

Hi @Havoc24k

Can you run waypoint build -vv, which will provide a bunch of debug out to help us pinpoint things. We’re wondering if that error is maybe because there is an issue with waypoint spawning the plugin locally.

Additionally, if you could provide the server and runner logs from right before you see the input/output error.

Thanks!

Hello,

Right now I am not able to replicate the issue since the Cluster has been stabilized and is no longer having abnormally heavy IOPS. I will try to create some artificial throttling on my connection and test it or just try to recreate the high load conditions.