Issues with the liveliness probe

chadsly · October 23, 2020, 1:50pm

I don’t see a lot of information about the liveliness of Waypoint. I feel that the issues that I’m running into are not actually Waypoint issues, but maybe someone can point me in the right direction.
I’m using Longhorn for my storage on a Rancher cluster if that matters. The Waypoint pod gets spun up just fine, but I get a liveliness probe error. I tried simply removing the liveliness probe, but because it’s a statefulset, I couldn’t do that.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled default-scheduler Successfully assigned default/waypoint-server-0 to longhorn3
Normal SuccessfulAttachVolume 84s attachdetach-controller AttachVolume.Attach succeeded for volume “pvc-76893e36-15cc-41f8-9b36-e64b1d1ea969”
Warning FailedMount 80s (x3 over 82s) kubelet, longhorn3 MountVolume.SetUp failed for volume “pvc-76893e36-15cc-41f8-9b36-e64b1d1ea969” : rpc error: code = InvalidArgument desc = There is no block device frontend for volume pvc-76893e36-15cc-41f8-9b36-e64b1d1ea969
Warning BackOff 52s (x2 over 53s) kubelet, longhorn3 Back-off restarting failed container
Normal Pulling 32s (x4 over 77s) kubelet, longhorn3 Pulling image “hashicorp/waypoint:latest”
Warning Unhealthy 32s (x3 over 72s) kubelet, longhorn3 Liveness probe failed: Get https://10.42.0.20:9702/: dial tcp 10.42.0.20:9702: connect: connection refused
Normal Killing 32s kubelet, longhorn3 Container server failed liveness probe, will be restarted
Normal Pulled 31s (x4 over 76s) kubelet, longhorn3 Successfully pulled image “hashicorp/waypoint:latest”
Normal Created 31s (x4 over 76s) kubelet, longhorn3 Created container server
Normal Started 31s (x4 over 75s) kubelet, longhorn3 Started container server

jbayer · October 23, 2020, 6:22pm

This seems to be pointing at the problem

Can you run other types of stateful sets? If you run waypoint install -platform=kubernetes -accept-tos -show-yaml it will show you the StatefulSet information used by Waypoint, which is very standard and request 1Gb of ReadWriteOnce PVC: Waypoint Install on Kubernetes YAML · GitHub

jbayer · October 23, 2020, 6:35pm

If you modify the StatefulSet to a Deployment I suspect this will work for you. The Waypoint Server will reset it’s data.db file every time the pod for the deployment restarts though. But if you’re ok with that for development purposes, then may work for you.

kubectl apply -f DEPLOYMENTFROMGIST where deployment gist is here.
waypoint server bootstrap -server-addr=[::]:9701 -server-tls-skip-verify
waypoint context verify should work and you can use Waypoint as normal other than all state will be lost on pod restart.

chadsly · October 23, 2020, 7:04pm

That error is just something that looks like in error in Longhorn, but apparently isn’t a real error. So I’m not too worried about that one. no block device frontend for volume error · Issue #1739 · longhorn/longhorn · GitHub

chadsly · October 23, 2020, 7:16pm

I get the same “Liveness probe failed: Get https://10.42.0.55:9702/: dial tcp 10.42.0.55:9702: connect: connection refused” on the kubectl apply -f waypoint.yaml where waypoint.yaml is the deployment from gist.
I didn’t even get a chance to run the waypoint context.

jbayer · October 23, 2020, 7:32pm

I was able to use the Deployment style without either of the liveness probes. See this gist for the end result: https://gist.github.com/jbayer/a54082b401f818ab8f1d3c2c36c9561d

My k8s is the one included with Docker for Mac locally. There are also learn guides that use GKE, AKS, and EKS so there should be good coverage on using different k8s versions. I suspect it’s an issue with the Kubernetes installation you’re using.

evanphx · October 23, 2020, 7:40pm

We don’t suggest you run the server as a Deployment as it won’t retain it’s data unless you manually wire up a PVC, in which case the StatefulSet should work fine.

You’ll need to look at the logs for the waypoint-server-0 pod, probably using -p to look at the previous container. The process is not starting or something is up with your Kubernetes install not providing the proper ports.

chadsly · October 23, 2020, 7:45pm

@evanphx and @jbayer I believe you’re correct that it’s an issue with my kubernetes cluster.

Topic		Replies	Views
LivenessProbe is unhealthy but Kubernetes Health Check still success? Consul health-check	4	2308	January 7, 2022
Intermittent 503 Liveness and Readiness Probe Health Check Consul service-mesh	2	1172	September 10, 2024
Vault-agent pod Liveness probe failing - remote error: tls: internal error Vault k8s , vault	0	873	December 16, 2022
Waypoint install on kubernetes failed with no volume Waypoint	0	569	December 10, 2020
Running Waypoint on KinD cluster with MetalLB Waypoint	3	505	January 12, 2023

Issues with the liveliness probe

Related topics