Is it possible to change the "primary datacenter" for Consul clusters?

If we connect multiple Consul clusters across several datacenters by specifying the primary_datacenter field in their configuration, what happens if we need to take that primary cluster out of commission and promote an existing “subservient” cluster in its place? If clusters B and C each respect cluster A as primary, but we decide that we want B to become the primary, or wish to introduce a new D cluster as the primary, can Consul accommodate this kind of pivoting?

What procedure would we use to adjust the configuration across several servers and lots of agents in each Consul cluster? Related to that, can a “subservient” cluster (e.g. B above, deferring to A as primary) handle becoming the primary itself?

Hello @seh,

Thanks for posting, and apologies for the long wait time on a response. We recently had another topic on the Consul Primary DC. @mkeeler posted here: Does/should the primary datacenter ever change? - #5 by mkeeler

Not only should you not need to account for the primary DC changing but many features of Consul rely on having a stable primary DC. For example, Consul’s ACL system will perform all writes (except for DC local tokens) within the primary DC and then replicate the data out to all the other datacenters. The primary DC is the source of truth for ACL policies, roles etc. If you were to manually change your primary datacenter configuration after setting all of this up, lots of things would stop working. It might in theory be possible to change your primary DC but this would require a lot of coordination and planning and is not something you should ever hope to do.

In short - when identifying DC’s, be sure to select a location that is prioritized for stability and control, and you should likely never need to swap DC’s.

I hope this helps, and thank you again for your patience as we work to catch up with all our replies!

Thank you, @jsosulska. I take that advice to be, “Don’t even try it. We don’t prevent it, but make no guarantee that you’ll be able to get it to work.”

It surprises me, though, that this isn’t a situation that arises more frequently. Of course we’d prefer stability, but circumstances beyond our control could mandate a shift like this: natural disaster, relocation of core business functions, extended cloud provider outage (unlikely as it may be), cost incentives to relocate workload. Telling people to “choose wisely once up front” doesn’t seem fair.

Hey @seh,

Thanks for the response! Just noticed you were in Pittsburgh, and I wanted to invite you to join our HashiCorp User Group in Pittsburgh - this could make for an interesting conversation when we all meet in person and kick off our first HUG in April/May: Pittsburgh HashiCorp User Group | Meetup

I’d like to reiterate something you said;

Of course we’d prefer stability, but circumstances beyond our control could mandate a shift like this: natural disaster, relocation of core business functions, extended cloud provider outage (unlikely as it may be), cost incentives to relocate workload.

Consul wants stability as well. Consul provides several features and functionality to assist in disaster scenarios, including snapshotting to help restore server state in case of failures. These tools can work in conjunction to help when business continuity fails, but there is not a quick flag to automatically fail over an entire data center.

In our Enterprise offering, we provide “Redundancy Zones”, which are documented here. Redundancy Zones provide a multi-AZ redundancy to stripe within a region, which can lower the impact of losing an AZ. Again, this isn’t scoped to region failures, but does provide an additional layer of protection against failure.

So, in summary, there isn’t a quick failover at the DC/region level. However, this gives me a great idea for a Learn Guide or blog post that we could create around “What to do expect when your primary datacenter fails unexpectedly.” When we make the content, we can update this thread. Thank you for the idea :slight_smile:

I’d love to hear any additional questions you may have, and hope to see you in Pittsburgh!

Best,
Jono

Thank you for the invitation, Jonathan. I joined the Meetup group. Now let’s see if we’re allowed to congregate by then.

1 Like

Let’s hope @seh!

Wishing you and yours well!

My apologies for usurping this thread - we are currently facing this precise scenario of having to change the primary datacenter across our clusters, as our current primary is being decommissioned.

@jsosulska are you able to share the current guidance on how to change the primary datacenter within a group of clusters? In our case the primary datacenter is going away and will not be replaced - we have dozens of other clusters, one of which we’d like to promote to the new primary. This overall migration is a significant amount of effort for us, so the approach doesn’t need to be either quick nor easy - we would like to optimise for safety over simplicity.

As an observation, while decommissioning datacenters doesn’t happen particularly frequently for us, in our case there’s a direct correlation between “primary datacenter”, “first datacenter” and so “oldest datacenter”. For that reason I’d expect there to be a wider need within the community over time for a blessed pattern for changing the primary datacenter as folks’ infrastructure ages.

Many thanks

1 Like

Hi All!

I’ve created an issue to track this for discussion. Please upvote the issue to raise visibility, as well as describe your use cases as a comment.

Hope this helps!
Jono.

Has this gone anywhere? I saw the ticket was created but didn’t seem to garner much interest from the developers.

In my case I have an on-premise and a few cloud datacenter joined together. As we work to decommission our on-premise equipment I’d like to promote one of my cloud-based datacenters to the new primary. This doesn’t seem possible at first glance. I’m curious how others have addressed this.