Failed to resolve consul-server-0.consul-server: lookup consul-server-0.consul-server on 30.43.0.10:53: no such host

Hi,
I have a k8s cluster over which i have deployed the consul server, but it is not forming all the replicas of the pod, it is only forming one pod and the pod is fluctuating between crashloopbackoff and running state.

develops-consul-server-0 0/1 Running 1 12m

this is the pod, and the description of the pod is

Events:
Type Reason Age From Message


Normal Scheduled 52m default-scheduler Successfully assigned onap/develops-consul-server-0 to onap-k8s-7
Normal Killing 39m (x2 over 45m) kubelet Container consul-server failed liveness probe, will be restarted
Normal Pulled 39m (x3 over 52m) kubelet Container image “nexus3.onap.org:10001/onap/oom/consul:2.1.0” already present on machine
Normal Created 39m (x3 over 52m) kubelet Created container consul-server
Normal Started 39m (x3 over 52m) kubelet Started container consul-server
Warning Unhealthy 33m (x8 over 46m) kubelet Liveness probe failed: dial tcp 30.42.9.29:8301: connect: connection refused
Warning Unhealthy 87s (x22 over 46m) kubelet Readiness probe failed: dial tcp 30.42.9.29:8301: connect: connection refused

I have attached the log file of the pod.
Has anyone faced this sort of error before, could someone help me figure out how to fix the error?

log.txt (13.7 KB)

There’s quite a lot going on here.

First, you’re launching Consul with -bind 127.0.0.1, thereby preventing one Consul pod from communicating with another.

This is compounded by having Kubernetes liveness/readiness checks set up which test whether a Consul listener which you’ve bound to 127.0.0.1 is accessible - but it’s bound to the localhost IP address, so it never is, so your pod is never healthy.

Since your pod is never healthy, it never becomes visible in Kubernetes DNS either.

Then, there’s the issue that not all your replicas are even trying to start … it seems to suggest your Kubernetes cluster has resource issues, and can’t fit them all? But you’d have to check Kubernetes status in other places to debug that further.

Lastly, you’re running Consul 1.0 … that’s extremely out of date - as in 11 feature releases behind. No-one should be using such an obsolete version these days.