Greetings!
I’ve been struggling off and on for 2 weeks trying to get two Kubernetes clusters to federate in a mesh.
kubectl get proxydefaults global -n consul
returns SYNCED=True, but consul members -wan
shows a status of failed
for members of the cluster on each side of mesh gateway.
It was my understanding that if you use the local
mode for Mesh Gateway all traffic goes through the local gateway but members complain that they can’t reach members on the other side. What am I missing?
meshGateway:
enabled: true
replicas: 1
service:
enabled: true
type: NodePort
wanAddress:
enabled: true
Hi @mister2d did you by chance configure a proxy default that directs all services to go through the local mesh gateway? Like the below configuration
apiVersion: consul.hashicorp.com/v1alpha1
kind: ProxyDefaults
metadata:
name: global
spec:
meshGateway:
mode: local
We have a learn tutorial that guides practititoners on how to secure traffic across two k8s clusters. In this tutorial, mesh gateways in local
mode are utilized.
# Secure Service Mesh Communication Across Kubernetes Clusters
Hi! @karl-cardenas-coding
Yes I am using that exact configuration. Does it matter that my k8s cluster binds to a non-routable interface? The IP range is in the 10.0.0.0/16 block which is where the Consul pods run. But I thought that didn’t matter if the mesh gateways were in local
mode and show that they are in sync.
The Consul federation token shows this in the secret value:
{
"primary_datacenter": "dc1",
"primary_gateways": ["cluster1.example.com:30085"]
}
I’ve confirmed that cluster1.example.com:30085
is reachable with netcat.
How did you wan
join the two datacenters? Also, can you share your datacenter configurations please?
Here are my helm chart values.
DC1:
global:
datacenter: dc1
name: consul
domain: consul
tls:
enabled: true
enableAutoEncrypt: true
serverAdditionalDNSSANs:
- "consul-server.consul.svc.cluster.local"
federation:
enabled: true
createFederationSecret: true
acls:
manageSystemACLs: true
createReplicationToken: true
gossipEncryption:
autoGenerate: true
logJSON: true
connectInject:
enabled: true
default: false
controller:
enabled: true
meshGateway:
enabled: true
replicas: 1
service:
enabled: true
type: NodePort
nodePort: 30085
wanAddress:
enabled: true
HostNetwork: true
syncCatalog:
enabled: true
default: true
toConsul: true
toK8S: true
metrics:
enabled: true
prometheus:
enabled: true
ui:
enabled: true
service:
type: NodePort
nodePort:
https: 30084
server:
replicas: 3
securityContext:
runAsNonRoot: false
runAsUser: 0
service:
type: NodePort
client:
securityContext:
runAsNonRoot: false
runAsUser: 0
DC2:
global:
datacenter: dc2
name: consul
domain: consul
tls:
enabled: true
enableAutoEncrypt: true
serverAdditionalDNSSANs:
- "consul-server.consul.svc.cluster.local"
caCert:
secretName: consul-federation
secretKey: caCert
caKey:
secretName: consul-federation
secretKey: caKey
acls:
manageSystemACLs: true
replicationToken:
secretName: consul-federation
secretKey: replicationToken
federation:
enabled: true
gossipEncryption:
secretName: consul-federation
secretKey: gossipEncryptionKey
logJSON: true
connectInject:
enabled: true
default: false
controller:
enabled: true
meshGateway:
enabled: true
replicas: 1
service:
enabled: true
type: NodePort
nodePort: 30085
wanAddress:
enabled: true
syncCatalog:
enabled: true
default: true
toConsul: true
toK8S: true
metrics:
enabled: true
prometheus:
enabled: true
ui:
enabled: true
service:
type: NodePort
nodePort:
https: 30084
server:
replicas: 1
securityContext:
runAsNonRoot: false
runAsUser: 0
extraVolumes:
- type: secret
name: consul-federation
items:
- key: serverConfigJSON
path: config.json
load: true
client:
securityContext:
runAsNonRoot: false
runAsUser: 0
@karl-cardenas-coding
I was able to figure it out. primary_gateways
was set to the Pod’s service IP which was not routable to the other cluster. I had to set the wanAddress
to a static value which reflected the FQDN of the wan interface. Only then did the clusters actually sync completely. When running a consul members -wan
, all server nodes now report “alive” status.
Working helm values
DC1:
global:
datacenter: dc1
name: consul
domain: consul
tls:
enabled: true
enableAutoEncrypt: true
serverAdditionalDNSSANs:
- "consul-server.consul.svc.cluster.local"
federation:
enabled: true
createFederationSecret: true
acls:
manageSystemACLs: true
createReplicationToken: true
gossipEncryption:
autoGenerate: true
logJSON: true
connectInject:
enabled: true
default: false
controller:
enabled: true
meshGateway:
enabled: true
replicas: 1
service:
enabled: true
type: NodePort
nodePort: 30085
wanAddress:
enabled: true
source: "Static"
static: "dc1.example.com"
port: 30085
syncCatalog:
enabled: true
default: true
toConsul: true
toK8S: true
metrics:
enabled: true
prometheus:
enabled: true
ui:
enabled: true
service:
type: NodePort
nodePort:
https: 30084
server:
replicas: 3
securityContext:
runAsNonRoot: false
runAsUser: 0
service:
type: NodePort
client:
securityContext:
runAsNonRoot: false
runAsUser: 0
DC2:
global:
datacenter: dc2
name: consul
domain: consul
tls:
enabled: true
enableAutoEncrypt: true
serverAdditionalDNSSANs:
- "consul-server.consul.svc.cluster.local"
caCert:
secretName: consul-federation
secretKey: caCert
caKey:
secretName: consul-federation
secretKey: caKey
acls:
manageSystemACLs: true
replicationToken:
secretName: consul-federation
secretKey: replicationToken
federation:
enabled: true
gossipEncryption:
secretName: consul-federation
secretKey: gossipEncryptionKey
logJSON: true
connectInject:
enabled: true
default: false
controller:
enabled: true
meshGateway:
enabled: true
replicas: 1
service:
enabled: true
type: NodePort
nodePort: 30085
wanAddress:
enabled: true
source: "Static"
static: "dc2.example.com"
port: 30085
syncCatalog:
enabled: true
default: true
toConsul: true
toK8S: true
metrics:
enabled: true
prometheus:
enabled: true
ui:
enabled: true
service:
type: NodePort
nodePort:
https: 30084
server:
replicas: 1
securityContext:
runAsNonRoot: false
runAsUser: 0
extraVolumes:
- type: secret
name: consul-federation
items:
- key: serverConfigJSON
path: config.json
load: true
client:
securityContext:
runAsNonRoot: false
runAsUser: 0
lkysow
February 14, 2022, 8:22pm
7
Nice! Is there a way for us to document this better?
I would say yes it should be documented better. I’ve seen a few topics around this same issue without a clear solution.
I’m guessing you’d like me to submit a PR.