Client pod restart

Is there a configuration that would allow the Consul client agent pod to be restarted, and for the registrations to be retained? This works fine when the client container restarts, but I can’t seem to find a way to make it work with a pod delete/restart.

Our customer has a specific test case that requires the client pod to be deleted/restarted and retain data. I see that there is a consul-connect-lifecycle-sidecar that performs this operation on an upgrade, but we’re not using that sidecar, just a simple Consul deployment.

I was hoping the client could simply rejoin the cluster, and resync its registration data from the cluster.
Is that not possible?

Hi @merge0303 thank you for asking this question!
Can you clarify a little bit about your configuration and deployment, are you using this in k8s with a consul-k8s deployment or something else?

Normally the lifecycle-sidecar is responsible for re-registering agent configurations. The reason for this is that we don’t have a persistent location for the daemonset’s pods so it’s state cannot be saved.

Hi @kschoche this is a K8s cluster with Consul used purely as a stand alone Service-Name database. Service discovery and registration is not integrated with K8s at all. Consul server is deployed as a StatefulSet (size 3), and the client DaemonSet is just a k8s pod on each node. Out of the box vanilla.

In the container restart scenario, the service registrations are preserved. Is this because the consul client utilizes the “data_dir” to persist the service registrations across container restarts? Obviously any pod local storage data is lost on a pod restart.

I was hopeful that some combination of command line switches to the client, would create a scenario where the remainder of the cluster would accept a previous member back, and then back fill said client with its prior service registration data.

Is that possible at all? There’s no other way to support pod delete test cases, without some sort of sidecar logic in the user’s application pods.

As a follow on question:
When the Consul client pod is deleted, are there any command line args to ask the remainder of Consul cluster to be patient before declaring all of those services to be “critical”. Something like “FailuresBeforeCritical”, but for missing DaemonSet pod? I can see how such a thing could potentially wreak havoc on quickly and deterministically identifying a consul client failure though. And honestly, I have not yet tested this scenario with a large FailuresBeforeCritical for each service, but I suspect it to have no effect in the scenario where the consul client pod is itself is deleted.

Hi @merge0303 we have not seen this outside of running the consul-k8s binary standalone. Could you help us understand why you’ve decided to use consul-k8s standalone, as running in this manner is not officially supported today. There could be perhaps some use cases we are not doing a good job of addressing via Consul Helm so I’m curious.

@david-yu Sure. We use Consul as the foundation of an asynchronous service discovery mechanism. A proprietary sidecar allows our applications to register a service name instance in consul and to “subscribe” to any arbitrary service name in consul. When our sidecar identifies an instance of a subscribed service name, making a “critical” to “passing” transition in the client health API, the sidecar then notifies the application with pertinent data via an HTTP webhook.

In this way, any pod can dynamically learn of specific application pod types (and sub-types) as they come and go from the cluster. Using just a couple simple HTTP api’s to our sidecar.

The KV store is pretty awesome too.

There are advantages and disadvantages to allowing apps to know this kind of data in any cloud based architecture. Of course we chose consul due both its resiliency (server StatefulSet) and scalability (client DaemonSet). Our application solution requires both of these features.