When deploying Consul servers within Kubernetes, it’s possible to use a StatefulSet to give the pods stable identity in DNS, and associate external storage with each pod. I’m wondering whether the files Consul writes to the path specified by the data_dir configuration field need to survive a crash of the server container, and whether they should outlast a given server pod to be used by a replacement pod later.
If the files are just a local optimization, we can store them in an “emptyDir” Kubernetes volume, and let them die with the pod. If instead the files are critical to Consul’s operation and we need to make them available for replacement servers to adopt, we can store them in a bound PersistentVolume in, say, AWS EC2 EBS volumes. That does mean, though, that the storage is zonally confined, mandating the a replacement for each Consul server come back within the same availability zone as its predecessor.
I see raft control files in that data_dir directory, and other things I don’t understand yet. Are they precious? Please advise.
Hi Steven,
It is dangerous to run Consul servers without durable storage however it is possible to do so. The Consul servers can be thought of as the database for Consul. They duplicate the data across each server, so as long as you have quorum, your data won’t be lost. But if you lose enough servers that there’s no quorum, you’re in trouble and could lose all your data. You’d then need to restore from backup.
Some more notes from asking the team:
If one server crashes and loses its data, the other servers will continue to function and when the crashed server restarts, it will get sent the data it lost. So on the face of it, as long as you don’t lose more than one server (assuming you have 3) then you don’t need durable storage.
However if you lose more than one server or there are network partitions then Raft’s (the consensus algorithm) guarantees no longer hold - it’s possible for a restarting server combined with a network partition to cause a split brain, lose committed writes or otherwise violate linearizability.
To your question about zones, yes the server will need to come back up in the same zone. Although due to the durability requirements outlined above, it’s best to spread your servers out by zone anyway.
A following question, then: Say that a container running a Consul server in a Kubernetes pod crashes, or gets evicted, and we get a replacement pod scheduled either on the same machine, but with a different pod IP address, or on a different machine with a different host address too. If we mount the PersistentVolume abandoned by the first server into this replacement Consul server, will the new server be able to use the files left behind by the first server? In other words, is there any server identity embedded in the file content that ties them to the first server (such as the server’s IP address)?