Hello,
I am experiencing stability and connectivity issues with Consul WAN federation using mesh gateways between two datacenters (dc1 and dc2). While the setup initially appears functional, cross-datacenter service connectivity over the mesh is unreliable. Connectivity oscillates between working and failing states, frequently resulting in No path to datacenter errors.
At a control-plane level, federation appears healthy:
$ consul members -wan
Node Address Status Type Build Protocol DC Partition Segment
master-1.dc1 <wan-dc-1>:8302 alive server 1.22.0 2 dc1 default <all>
master-1.dc2 <wan-dc-2>:8302 alive server 1.22.1 2 dc2 default <all>
Service catalogs are visible across datacenters:
$ consul catalog services -datacenter=dc2
consul
mesh-gateway-dc2
web
web-sidecar-proxy
$ consul catalog services -datacenter=dc1
consul
mesh-gateway-dc1
socat
socat-sidecar-proxy
However, runtime behavior contradicts this apparent health. Server logs in dc2 repeatedly report federation, ACL, and config replication failures due to missing WAN paths, alongside Connect CA initialization failures indicating the primary datacenter is intermittently unreachable:
RPC request for DC is currently failing as no path was found: datacenter=dc1
...
Failed to initialize Connect CA: primary datacenter is unreachable
...
handling error in Manager.Notify: CA is uninitialized and unable to sign certificates yet
Envoy logs in the primary datacenter (dc1) intermittently show gRPC stream closures due to unauthenticated ACL access, despite using the bootstrap token and a fully bootstrapped ACL system:
unauthenticated: ACL system must be bootstrapped before making any requests that require authorization
In summary, WAN membership and catalog visibility suggest a correct configuration, but federation stability, ACL replication, Connect CA initialization, and dataplane traffic are all intermittently failing. I am currently unable to identify the underlying misconfiguration or systemic issue and would appreciate guidance on where to focus further troubleshooting.
Thank you in advance.
DC1 Node Config:
datacenter = "dc1"
data_dir = "/opt/consul/data"
node_name = "master-1"
client_addr = "0.0.0.0"
advertise_addr = "10.0.0.4"
advertise_addr_wan = "<dc1-wan-addr>"
server = true
log_level = "DEBUG"
bootstrap_expect = 1
tls {
defaults {
ca_file = "/usr/local/share/ca-certificates/stack.crt"
cert_file = "/etc/consul.d/tls/consul.crt"
key_file = "/etc/consul.d/tls/consul.key"
}
internal_rpc {
verify_incoming = true
verify_outgoing = true
verify_server_hostname = true
}
}
ports {
http = 8500
https = 8501
grpc = 8502
grpc_tls = 8503
dns = 8600
serf_lan = 8301
serf_wan = 8302
}
ui_config {
enabled = true
}
acl {
enabled = true
default_policy = "deny"
enable_token_persistence = true
enable_token_replication = true
}
connect {
enabled = true
enable_mesh_gateway_wan_federation=true
}
DC1 Envoy Command:
consul connect envoy -gateway=mesh -register -expose-servers \
-service mesh-gateway-dc1 \
-address 10.0.0.4:8443 \
-wan-address <dc1-node-wan-addr>:8443 \
-ca-file /usr/local/share/ca-certificates/stack.crt \
-token <dc1-bootstrap-token>
DC2 Node Config:
datacenter = "dc2"
primary_datacenter = "dc1"
data_dir = "/opt/consul/data"
node_name = "master-1"
client_addr = "0.0.0.0"
advertise_addr = "10.1.0.5"
advertise_addr_wan = "<wan-addr-dc2>"
server = true
bootstrap_expect = 1
log_level = "DEBUG"
retry_join = [
]
tls {
defaults {
ca_file = "/usr/local/share/ca-certificates/stack.crt"
cert_file = "/etc/consul.d/tls/consul.crt"
key_file = "/etc/consul.d/tls/consul.key"
}
internal_rpc {
verify_incoming = true
verify_outgoing = true
verify_server_hostname = true
}
}
ports {
http = 8500
https = 8501
grpc = 8502
grpc_tls = 8503
dns = 8600
serf_lan = 8301
serf_wan = 8302
}
ui_config {
enabled = true
}
acl {
enabled = true
default_policy = "deny"
enable_token_persistence = true
enable_token_replication = true
down_policy = "extend-cache"
tokens {
replication = "<dc1-bootstrap-token>"
}
}
primary_gateways = ["<wan-addr-dc-1>:8443"]
connect {
enabled = true
enable_mesh_gateway_wan_federation=true
}
DC2 Envoy Command:
consul connect envoy -gateway=mesh -register -service "gateway-secondary" -expose-servers \
-ca-file=/usr/local/share/ca-certificates/stack.crt \
-service mesh-gateway-dc2 \
-address 10.1.0.5:8443 \
-wan-address <dc2-node-wan-addr>:8443 \
-token <dc1-bootstrap-token>