Error connecting 2 DC

I am trying to connect 2 gke cluster using the service mesh communication across Kubernetes Clusters.
I am following the document: Secure Service Mesh Communication Across Kubernetes Clusters | Consul - HashiCorp Learn.

However, when I create dc1, I get the error:
Error: UPGRADE FAILED: post-upgrade hooks failed: timed out waiting for the condition

Below is the output of ‘kubectl get pods’.

consul-7lqgv 1/1 Running 0 11m
consul-8bn4n 0/1 Running 0 7m38s
consul-connect-injector-webhook-deployment-74c8758896-b97fd 1/1 Running 0 7m44s
consul-jnx4f 1/1 Running 0 12m
consul-mesh-gateway-59cf966bc4-ksklp 2/2 Running 0 10m
consul-mesh-gateway-59cf966bc4-qvzx6 2/2 Running 0 10m
consul-mesh-gateway-7fb76849dc-pzrtw 0/2 Init:CrashLoopBackOff 6 7m44s
consul-server-0 1/1 Running 0 14m
consul-server-1 1/1 Running 0 14m
consul-server-2 0/1 Running 0 7m31s

Please suggest.

What are the logs and kubectl describe output for the consul server that is not yet ready? It needs to be ready for everything to start u successfully.

Hi @lkysow I got passed this step.
Somehow if I deploy the entire configuration with --wait, it throws a timeout error.
Hence, I first deployed the server with federation disabled, and once the server pod was up and running, I upgraded the configuration with federation enabled and gossip encryption.

Now that I have created 2 datacenters dc1 and dc2, it is not able to join the cluster.

Below is the output from consul server of dc2.

2021-06-17T06:26:31.923Z [WARN] agent: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
2021-06-17T06:26:31.923Z [WARN] agent: bootstrap = true: do not enable unless necessary
2021-06-17T06:26:32.127Z [WARN] agent.auto_config: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
2021-06-17T06:26:32.127Z [WARN] agent.auto_config: bootstrap = true: do not enable unless necessary
2021-06-17T06:26:32.224Z [INFO] agent.server.gateway_locator: will dial the primary datacenter using our local mesh gateways if possible
2021-06-17T06:26:32.256Z [INFO] agent.server.raft: initial configuration: index=12 servers="[{Suffrage:Voter ID:9a1199a6-4f73-797e-7f3d-11aa6e6c2d58 Address:10.8.6.35:8300}]"
2021-06-17T06:26:32.256Z [INFO] agent.server.raft: entering follower state: follower=“Node at 10.8.6.43:8300 [Follower]” leader=
2021-06-17T06:26:32.323Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: consul-server-0.dc2 10.8.6.43
2021-06-17T06:26:32.323Z [INFO] agent.server.serf.wan: serf: Attempting re-join to previously known node: consul-server-0.dc1: 10.8.6.33:8302
2021-06-17T06:26:32.323Z [WARN] agent.server.serf.wan: serf: Failed to re-join any previously known node
2021-06-17T06:26:32.324Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: consul-server-0 10.8.6.43
2021-06-17T06:26:32.324Z [INFO] agent.router: Initializing LAN area manager
2021-06-17T06:26:32.324Z [INFO] agent.server.serf.lan: serf: Attempting re-join to previously known node: gke-test-cluster-pool-3-7fc2780c-vm5f: 10.8.6.37:8301
2021-06-17T06:26:32.324Z [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=udp
2021-06-17T06:26:32.324Z [INFO] agent.server: Adding LAN server: server=“consul-server-0 (Addr: tcp/10.8.6.43:8300) (DC: dc2)”
2021-06-17T06:26:32.324Z [INFO] agent.server: Handled event for server in area: event=member-join server=consul-server-0.dc2 area=wan
2021-06-17T06:26:32.325Z [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=tcp
2021-06-17T06:26:32.325Z [INFO] agent.server.serf.lan: serf: Attempting re-join to previously known node: gke-test-cluster-pool-3-7fc2780c-grt7: 10.8.4.18:8301
2021-06-17T06:26:32.328Z [INFO] agent: Starting server: address=[::]:8501 network=tcp protocol=https
2021-06-17T06:26:32.328Z [WARN] agent: DEPRECATED Backwards compatibility with pre-1.9 metrics enabled. These metrics will be removed in a future version of Consul. Set telemetry { disable_compat_1.9 = true } to disable them.
2021-06-17T06:26:32.328Z [WARN] agent.server.serf.lan: serf: Failed to re-join any previously known node
2021-06-17T06:26:32.328Z [INFO] agent: started state syncer
==> Consul agent running!
2021-06-17T06:26:32.328Z [INFO] agent: Refreshing mesh gateways is supported for the following discovery methods: discovery_methods=“aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere”
2021-06-17T06:26:32.328Z [INFO] agent: Refreshing mesh gateways…
2021-06-17T06:26:32.328Z [INFO] agent.server.gateway_locator: updated fallback list of primary mesh gateways: mesh_gateways=[34.93.181.62:443]
2021-06-17T06:26:32.328Z [INFO] agent: Refreshing mesh gateways completed
2021-06-17T06:26:32.329Z [INFO] agent: Retry join is supported for the following discovery methods: cluster=WAN discovery_methods=“aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere”
2021-06-17T06:26:32.329Z [INFO] agent: Joining cluster…: cluster=WAN
2021-06-17T06:26:32.329Z [INFO] agent: (WAN) joining: wan_addresses=[*.dc1/192.0.2.2]
2021-06-17T06:26:32.329Z [WARN] agent: (WAN) couldn’t join: number_of_nodes=0 error="1 error occurred:
* Failed to join 192.0.2.2: Remote DC has no server currently reachable

"
2021-06-17T06:26:32.329Z [WARN] agent: Join cluster failed, will retry: cluster=WAN retry_interval=30s error=
2021-06-17T06:26:32.328Z [INFO] agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods=“aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere”
2021-06-17T06:26:32.329Z [INFO] agent: Joining cluster…: cluster=LAN
2021-06-17T06:26:32.329Z [INFO] agent: (LAN) joining: lan_addresses=[consul-server-0.consul-server.default.svc:8301]
2021-06-17T06:26:32.422Z [INFO] agent: (LAN) joined: number_of_nodes=1
2021-06-17T06:26:32.422Z [INFO] agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=1
2021-06-17T06:26:37.953Z [WARN] agent.server.raft: heartbeat timeout reached, starting election: last-leader=
2021-06-17T06:26:37.953Z [INFO] agent.server.raft: entering candidate state: node=“Node at 10.8.6.43:8300 [Candidate]” term=3
2021-06-17T06:26:37.958Z [INFO] agent.server.raft: election won: tally=1
2021-06-17T06:26:37.958Z [INFO] agent.server.raft: entering leader state: leader=“Node at 10.8.6.43:8300 [Leader]”
2021-06-17T06:26:37.958Z [INFO] agent.server: cluster leadership acquired
2021-06-17T06:26:37.958Z [INFO] agent.server: New leader elected: payload=consul-server-0
2021-06-17T06:26:38.227Z [INFO] agent: Synced node info
2021-06-17T06:26:38.227Z [ERROR] agent.server.autopilot: Error when computing next state: error=“cannot detect the current leader server id from its address: 10.8.6.43:8300”
2021-06-17T06:26:38.227Z [INFO] agent.leader: started routine: routine=“config entry replication”
2021-06-17T06:26:38.227Z [INFO] agent.leader: started routine: routine=“federation state replication”
2021-06-17T06:26:38.227Z [INFO] agent.leader: started routine: routine=“federation state anti-entropy”
2021-06-17T06:26:38.227Z [WARN] agent.server.connect: primary datacenter is configured but unreachable - deferring initialization of the secondary datacenter CA
2021-06-17T06:26:38.227Z [INFO] agent.leader: started routine: routine=“secondary CA roots watch”
2021-06-17T06:26:38.227Z [INFO] agent.leader: started routine: routine=“intermediate cert renew watch”
2021-06-17T06:26:38.227Z [INFO] agent.leader: started routine: routine=“CA root pruning”
2021-06-17T06:26:38.227Z [INFO] agent.server.raft: updating configuration: command=AddStaging server-id=9a1199a6-4f73-797e-7f3d-11aa6e6c2d58 server-addr=10.8.6.43:8300 servers="[{Suffrage:Voter ID:9a1199a6-4f73-797e-7f3d-11aa6e6c2d58 Address:10.8.6.43:8300}]"
2021-06-17T06:26:38.228Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=ConfigEntry.ListAll
2021-06-17T06:26:38.228Z [WARN] agent.server.replication.config_entry: replication error (will retry if still leader): error=“failed to retrieve remote config entries: No path to datacenter”
2021-06-17T06:26:38.228Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=FederationState.List
2021-06-17T06:26:38.228Z [WARN] agent.server.replication.federation_state: replication error (will retry if still leader): error=“failed to retrieve federation states: No path to datacenter”
2021-06-17T06:26:38.228Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=ConnectCA.Roots
2021-06-17T06:26:38.228Z [ERROR] agent.server.connect: CA root replication failed, will retry: routine=“secondary CA roots watch” error=“Error retrieving the primary datacenter’s roots: No path to datacenter”
2021-06-17T06:26:38.229Z [INFO] agent.server: deregistering member: member=gke-test-cluster-pool-3-7fc2780c-grt7 reason=reaped
2021-06-17T06:26:38.231Z [INFO] agent.server: deregistering member: member=gke-test-cluster-pool-3-7fc2780c-vm5f reason=reaped
2021-06-17T06:26:38.231Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=FederationState.Apply
2021-06-17T06:26:38.232Z [ERROR] agent.server: error performing anti-entropy sync of federation state: error=“error performing federation state anti-entropy sync: No path to datacenter”
2021-06-17T06:26:39.229Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=ConfigEntry.ListAll
2021-06-17T06:26:39.229Z [WARN] agent.server.replication.config_entry: replication error (will retry if still leader): error=“failed to retrieve remote config entries: No path to datacenter”
2021-06-17T06:26:39.301Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=FederationState.List
2021-06-17T06:26:39.301Z [WARN] agent.server.replication.federation_state: replication error (will retry if still leader): error=“failed to retrieve federation states: No path to datacenter”
2021-06-17T06:26:40.228Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=ConnectCA.Roots
2021-06-17T06:26:40.228Z [ERROR] agent.server.connect: CA root replication failed, will retry: routine=“secondary CA roots watch” error=“Error retrieving the primary datacenter’s roots: No path to datacenter”
2021-06-17T06:26:40.232Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=FederationState.Apply
2021-06-17T06:26:40.232Z [ERROR] agent.server: error performing anti-entropy sync of federation state: error=“error performing federation state anti-entropy sync: No path to datacenter”
2021-06-17T06:26:41.259Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=ConfigEntry.ListAll
2021-06-17T06:26:41.259Z [WARN] agent.server.replication.config_entry: replication error (will retry if still leader): error=“failed to retrieve remote config entries: No path to datacenter”
2021-06-17T06:26:41.453Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=FederationState.List
2021-06-17T06:26:41.453Z [INFO] agent.server.gateway_locator: will dial the primary datacenter through its mesh gateways
2021-06-17T06:26:41.453Z [WARN] agent.server.replication.federation_state: replication error (will retry if still leader): error=“failed to retrieve federation states: No path to datacenter”
2021-06-17T06:26:42.401Z [INFO] agent: Newer Consul version available: new_version=1.9.6 current_version=1.9.4
2021-06-17T06:26:42.795Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: gke-test-cluster-pool-3-7fc2780c-grt7 10.8.4.20
2021-06-17T06:26:42.795Z [INFO] agent.server: member joined, marking health alive: member=gke-test-cluster-pool-3-7fc2780c-grt7
2021-06-17T06:26:44.228Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=ConnectCA.Roots
2021-06-17T06:26:44.228Z [ERROR] agent.server.connect: CA root replication failed, will retry: routine=“secondary CA roots watch” error=“Error retrieving the primary datacenter’s roots: No path to datacenter”
2021-06-17T06:26:44.232Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=FederationState.Apply
2021-06-17T06:26:44.232Z [ERROR] agent.server: error performing anti-entropy sync of federation state: error=“error performing federation state anti-entropy sync: No path to datacenter”
2021-06-17T06:26:45.350Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=ConfigEntry.ListAll
2021-06-17T06:26:45.350Z [WARN] agent.server.replication.config_entry: replication error (will retry if still leader): error=“failed to retrieve remote config entries: No path to datacenter”
2021-06-17T06:26:45.731Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=FederationState.List
2021-06-17T06:26:45.731Z [WARN] agent.server.replication.federation_state: replication error (will retry if still leader): error=“failed to retrieve federation states: No path to datacenter”
2021-06-17T06:26:52.228Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=ConnectCA.Roots
2021-06-17T06:26:52.228Z [ERROR] agent.server.connect: CA root replication failed, will retry: routine=“secondary CA roots watch” error=“Error retrieving the primary datacenter’s roots: No path to datacenter”
2021-06-17T06:26:52.232Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=FederationState.Apply
2021-06-17T06:26:52.232Z [ERROR] agent.server: error performing anti-entropy sync of federation state: error=“error performing federation state anti-entropy sync: No path to datacenter”
2021-06-17T06:26:53.978Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=FederationState.List
2021-06-17T06:26:53.978Z [WARN] agent.server.replication.federation_state: replication error (will retry if still leader): error=“failed to retrieve federation states: No path to datacenter”
2021-06-17T06:26:54.006Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=ConfigEntry.ListAll
2021-06-17T06:26:54.006Z [WARN] agent.server.replication.config_entry: replication error (will retry if still leader): error=“failed to retrieve remote config entries: No path to datacenter”
2021-06-17T06:26:55.271Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: gke-test-cluster-pool-3-7fc2780c-vm5f 10.8.6.44
2021-06-17T06:26:55.271Z [INFO] agent.server: member joined, marking health alive: member=gke-test-cluster-pool-3-7fc2780c-vm5f
2021-06-17T06:26:55.764Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: gke-test-cluster-pool-3-7fc2780c-699n 10.8.5.19
2021-06-17T06:26:55.764Z [INFO] agent.server: member joined, marking health alive: member=gke-test-cluster-pool-3-7fc2780c-699n
2021-06-17T06:27:02.329Z [INFO] agent: (WAN) joining: wan_addresses=[*.dc1/192.0.2.2]
2021-06-17T06:27:02.922Z [WARN] agent: (WAN) couldn’t join: number_of_nodes=0 error="1 error occurred:
* Failed to join 192.0.2.2: x509: certificate signed by unknown authority (possibly because of “x509: ECDSA verification failure” while trying to verify candidate authority certificate “Consul Agent CA”)

"
2021-06-17T06:27:02.922Z [WARN] agent: Join cluster failed, will retry: cluster=WAN retry_interval=30s error=
2021-06-17T06:27:08.229Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=ConnectCA.Roots
2021-06-17T06:27:08.229Z [ERROR] agent.server.connect: CA root replication failed, will retry: routine=“secondary CA roots watch” error=“Error retrieving the primary datacenter’s roots: No path to datacenter”
2021-06-17T06:27:08.233Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=FederationState.Apply
2021-06-17T06:27:08.233Z [ERROR] agent.server: error performing anti-entropy sync of federation state: error=“error performing federation state anti-entropy sync: No path to datacenter”
2021-06-17T06:27:10.503Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=FederationState.List
2021-06-17T06:27:10.503Z [WARN] agent.server.replication.federation_state: replication error (will retry if still leader): error=“failed to retrieve federation states: No path to datacenter”
2021-06-17T06:27:10.607Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=ConfigEntry.ListAll
2021-06-17T06:27:10.607Z [WARN] agent.server.replication.config_entry: replication error (will retry if still leader): error=“failed to retrieve remote config entries: No path to datacenter”
2021-06-17T06:27:32.922Z [INFO] agent: (WAN) joining: wan_addresses=[*.dc1/192.0.2.2]
2021-06-17T06:27:33.425Z [WARN] agent: (WAN) couldn’t join: number_of_nodes=0 error="1 error occurred:
* Failed to join 192.0.2.2: x509: certificate signed by unknown authority (possibly because of “x509: ECDSA verification failure” while trying to verify candidate authority certificate “Consul Agent CA”)

Here is my values.yaml for dc2.

global:
name: consul
datacenter: dc2
federation:
enabled: true
createFederationSecret: true
tls:
enabled: true
caCert:
secretName: consul-federation
secretKey: caCert
caKey:
secretName: consul-federation
secretKey: caKey
gossipEncryption:
secretName: consul-federation
secretKey: gossipEncryptionKey
server:
replicas: 1
extraVolumes:
- type: secret
name: consul-federation
items:
- key: serverConfigJSON
path: config.json
load: true
ui:
service:
type: ‘LoadBalancer’
enabled: true
meshGateway:
enabled: true
replicas: 1
connectInject:
enabled: true
controller:
enabled: true

On checking the proxydefault status on dc2; the synced status is empty.

kubectl get proxydefaults global
NAME SYNCED LAST SYNCED AGE
global 56s

I was able to establish the connection between 2 dc’s.
I get an error when I try creating proxy default:
"error: unable to recognize “proxy-default.yaml”: no matches for kind “ProxyDefaults” in version “consul.hashicorp.com/v1alpha1

Pls suggest

@lkysow Can you please help with the above error.
Below is the output for helm search repo hashicorp/consul.

NAME CHART VERSION APP VERSION DESCRIPTION
hashicorp/consul 0.31.1 1.9.4 Official HashiCorp Consul Chart

Hi, sorry for the delay.

Can you run kubectl get crd?

I got exactly the same error: agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc1 method=ConfigEntry.Apply
Just wonder how you solve the connectivity issue between 2 data centres!

Same whats the solution for this?

Hello, I am facing the exact same issue while following the official documentation on federation (latest version of consul and the helm chart, fresh install of both primary and secondary clusters…)

Does anyone have any clue on how to fix that ?

Some logs and info :

logs in the consul-dc2 (secondary cluster)
agent.server.rpc: RPC failed to server in DC: server=XX.X.XXX.XX:8300 datacenter=consul-dc1 method=ACL.Login error=“rpc error making call: i/o deadline reached”

mesh gateway wont start in dc2, as well as the controller.

I checked connectivity and allowed communications on all ports and protocols accross between all services for debug purpose and it does not change anything.

Been having this issue for a couple of days now and I’m really stuck now so any help will be much appreciated.