Hello guys,
I have a federation issue between a k8s cluster and VMs as follows.
I try to setup a federated consul cluster (named dc2) on k3s with helm. The primary consul cluster runs on VMs (named dc1).
To achieve this, I followed this guide : Kubernets and VM cluster federation
So, on top of a running 3 node consul cluster (dc1), I have:
- created a built-in CA (
consul tls ca create
) and copied the cert on each dc1 server. - created a certificate/key per dc1 server and deployed them (
consul tls cert create -server -dc dc1
) - updated the configuration of the dc1 servers to use TLS
"verify_incoming": true,
"verify_incoming_rpc": true,
"verify_outgoing": true,
"verify_server_hostname": true,
- updated the configuration of the servers (in dc1) to activate the federation by mesh gateway.
"connect": {
"ca_provider": "consul",
"enable_mesh_gateway_wan_federation": true
},
"primary_datacenter": "dc1"
- activated the ACLs and created a token to replicate the ACLs (according Doc Federation Between VMs and Kubernetes)
- added federation secrets in k3s
kubectl create secret -n consul generic consul-federation \
--from-literal=caCert="$(cat consul-agent-ca.pem)" \
--from-literal=caKey="$(cat consul-agent-ca-key.pem)"
--from-literal=replicationToken=<my_replication_token>
--from-literal=gossipEncryptionKey=<my_gossip_encryption_key>
-
created an ACL token for the mesh gateway of dc1 (according Tutorial Connect Services Across Datacenters with Mesh Gateways)
-
launched an instance of envoy mesh gateway (port
443
was already in use, so a choose an unused port for both dc1 and dc2)
consul connect envoy -gateway=mesh -register \
-service "gateway-dc1" \
-address "192.168.11.10:8555" \
-wan-address "192.168.11.10:8555"\
-token=<my_meshgateway_dc1_acl_token> \
--expose-servers
The dc1 is composed of three VMs with ip: 192.168.11.10
, 192.168.11.11
, 192.168.11.12
. The dc1 mesh gateway is on 192.168.11.10
. The k3s is single node and its ip is 192.168.11.20
.
These 4 instances can ping each other and there are no firewall restrictions. I can ping for example the mesh gateway of dc1 from one of the consul servers containers in k3s.
However, the consul server shows me the following errors at start (complete logs in a file at the end to avoid flooding):
2022-03-16T22:44:50.751Z [INFO] agent.server.gateway_locator: updated fallback list of primary mesh gateways: mesh_gateways=[192.168.11.10:8555]
2022-03-16T22:44:50.751Z [INFO] agent: Refreshing mesh gateways completed
2022-03-16T22:44:50.751Z [INFO] agent: Retry join is supported for the following discovery methods: cluster=WAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
2022-03-16T22:44:50.751Z [INFO] agent: Joining cluster...: cluster=WAN
2022-03-16T22:44:50.751Z [INFO] agent: (WAN) joining: wan_addresses=[*.dc1/192.0.2.2]
2022-03-16T22:44:50.751Z [WARN] agent: (WAN) couldn't join: number_of_nodes=0 error="1 error occurred:
* Failed to join 192.0.2.2:8302: Remote DC has no server currently reachable
"
2022-03-16T22:44:50.751Z [WARN] agent: Join cluster failed, will retry: cluster=WAN retry_interval=30s error=<nil>
I was confused by the TEST-NET-1 address `192.0.2.2 in logs but it seems to be normal (according to agent/retry_join.go:68).
I have other pods that are pending because of missing ACLs
2022-03-17T11:17:42.784Z [ERROR] Failure: calling /agent/self to get datacenter: err="Unexpected response code: 403 (ACL not found)"
4
2022-03-17T11:17:42.784Z [INFO] Retrying in 1s
but I guess it’s normal as the first step is to connect to the primary dc and then sync the ACLs.
After checking all my config, I’ve no more idea, if you have a clue I’ll be happy to hear you about it
Additional files if it can help :
- Server logs (dc2) dc2_consul_server_logs.txt (12.3 KB)
- Helm chart values (dc2) values.yaml.txt (685 Bytes)
- Server config (dc1) config.json.txt (2.2 KB)