Errors after changing DC

Hello,

I changed the main datacenter on the consul configuration.
Everything sounds to work correctly but in the consul logs I see this error multiple times:

2020/05/11 09:35:11 [WARN] consul.rpc: RPC request for unknown DC "dc"

How to cleanup this DC definitively ?
Thanks.

Hi @smutel

Thanks for posting :slight_smile:

Is this for a single datacenter configuration, or are you running multiple datacenters federated together?
Can you please provide some more information on the process you took to change the main DC?

Thanks!

This is a single datacenter configuration.
I change the primary_datacenter and datacenter parameter in the config.

Hi @smutel,

I tried this out with a dev-mode agent. I didn’t see anything in the logs when I changed the primary DC and did a consul reload. Dev mode doesn’t reflect a production configuration very well. I’ll spin up a few nodes and play with the config over the next couple of days.

I have this GitHub issue to track a request to rotate primary DC’s: https://github.com/hashicorp/consul/issues/7817. Currently, primary DC’s have a blast radius if they are changed, and I am trying to collect data on what happens in different clusters when this happens. Please :+1: to the issue if you’re interested to track.

Can you post your server config file, as well as the command you are using to run the servers?

If anyone else reading this wants to try, feel free to post what they did and the results :slight_smile: .

How much of the primary DC state is persisted in backups? My use case is reasonably small, I only have a couple of clusters and could feasibly take the whole lot down over a weekend to migrate.

I’ll do my own testing but if I restored a backup of the existing primary to a new cluster would that be safe? Then I could effectively reset the others, join them to the new primary and I think everything would still work…

We also have 2 datacenters and are contractually required to rotate between them once per year. We also can not use public cloud due to contracts, so making a 3rd datacenter in AWS isn’t an option.

Currently we are able to failover everything except consul. Which would be a major problem if say the first datacenter were to be lost in a fire.

This is a feature we need.

According to the replication documentation, setting up replication will destroy all tokens in the secondary site.
If all the ACLs are managed with terraform, theoretically it wouldn’t be too hard to re-run terraform apply to recreate the tokens after the switch.

Also it may be possible to create a snapshot right before the migration and use it to restore