Issues with the liveliness probe

I don’t see a lot of information about the liveliness of Waypoint. I feel that the issues that I’m running into are not actually Waypoint issues, but maybe someone can point me in the right direction.
I’m using Longhorn for my storage on a Rancher cluster if that matters. The Waypoint pod gets spun up just fine, but I get a liveliness probe error. I tried simply removing the liveliness probe, but because it’s a statefulset, I couldn’t do that.
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled default-scheduler Successfully assigned default/waypoint-server-0 to longhorn3
Normal SuccessfulAttachVolume 84s attachdetach-controller AttachVolume.Attach succeeded for volume “pvc-76893e36-15cc-41f8-9b36-e64b1d1ea969”
Warning FailedMount 80s (x3 over 82s) kubelet, longhorn3 MountVolume.SetUp failed for volume “pvc-76893e36-15cc-41f8-9b36-e64b1d1ea969” : rpc error: code = InvalidArgument desc = There is no block device frontend for volume pvc-76893e36-15cc-41f8-9b36-e64b1d1ea969
Warning BackOff 52s (x2 over 53s) kubelet, longhorn3 Back-off restarting failed container
Normal Pulling 32s (x4 over 77s) kubelet, longhorn3 Pulling image “hashicorp/waypoint:latest”
Warning Unhealthy 32s (x3 over 72s) kubelet, longhorn3 Liveness probe failed: Get dial tcp connect: connection refused
Normal Killing 32s kubelet, longhorn3 Container server failed liveness probe, will be restarted
Normal Pulled 31s (x4 over 76s) kubelet, longhorn3 Successfully pulled image “hashicorp/waypoint:latest”
Normal Created 31s (x4 over 76s) kubelet, longhorn3 Created container server
Normal Started 31s (x4 over 75s) kubelet, longhorn3 Started container server

This seems to be pointing at the problem

Can you run other types of stateful sets? If you run waypoint install -platform=kubernetes -accept-tos -show-yaml it will show you the StatefulSet information used by Waypoint, which is very standard and request 1Gb of ReadWriteOnce PVC:

If you modify the StatefulSet to a Deployment I suspect this will work for you. The Waypoint Server will reset it’s data.db file every time the pod for the deployment restarts though. But if you’re ok with that for development purposes, then may work for you.

  1. kubectl apply -f DEPLOYMENTFROMGIST where deployment gist is here.
  2. waypoint server bootstrap -server-addr=[::]:9701 -server-tls-skip-verify
  3. waypoint context verify should work and you can use Waypoint as normal other than all state will be lost on pod restart.

That error is just something that looks like in error in Longhorn, but apparently isn’t a real error. So I’m not too worried about that one.

I get the same “Liveness probe failed: Get dial tcp connect: connection refused” on the kubectl apply -f waypoint.yaml where waypoint.yaml is the deployment from gist.
I didn’t even get a chance to run the waypoint context.

I was able to use the Deployment style without either of the liveness probes. See this gist for the end result:

My k8s is the one included with Docker for Mac locally. There are also learn guides that use GKE, AKS, and EKS so there should be good coverage on using different k8s versions. I suspect it’s an issue with the Kubernetes installation you’re using.

1 Like

We don’t suggest you run the server as a Deployment as it won’t retain it’s data unless you manually wire up a PVC, in which case the StatefulSet should work fine.

You’ll need to look at the logs for the waypoint-server-0 pod, probably using -p to look at the previous container. The process is not starting or something is up with your Kubernetes install not providing the proper ports.

1 Like

@evanphx and @jbayer I believe you’re correct that it’s an issue with my kubernetes cluster.