Hi @ishustava1,
Thanks for reply!
I will open an issue on GitHub, I found a workaround, for now, I changed my nodes’ names and recreated the certificates to match them (eg: ceph-1.hirsingue.infra.mydomain.fr
to ceph-1
).
I found the following topic: Mesh Gateway federation woes!.
So I have:
- changed the Meshgateway of dc2 so that it is a NodePort that exposes it.
meshGateway:
enabled: true
replicas: 1
service:
nodePort: 30555
enabled: true
type: NodePort
wanAddress:
enabled: true
source: "Static"
static: "192.168.11.20"
port: 30555
- created a proxy defaults config. Unless I’m mistaken, this will have an impact later on the services and doesn’t influence the federation, right?
apiVersion: consul.hashicorp.com/v1alpha1
kind: ProxyDefaults
metadata:
name: global
spec:
meshGateway:
mode: local
The second dc however always encounters an error, he doesn’t get responses from his requests to dc1.
dc2 consul server logs:
2022-03-23T16:33:31.488Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: ceph-2.dc1 192.168.11.11
2022-03-23T16:33:31.488Z [INFO] agent.server: Handled event for server in area: event=member-join server=ceph-2.dc1 area=wan
2022-03-23T16:34:10.054Z [INFO] agent.server.memberlist.wan: memberlist: Suspect ceph-3.dc1 has failed, no acks received
2022-03-23T16:34:31.494Z [WARN] agent.server.memberlist.wan: memberlist: Refuting a suspect message (from: consul-consul-server-0.dc2)
2022-03-23T16:34:50.055Z [INFO] agent.server.memberlist.wan: memberlist: Suspect ceph-2.dc1 has failed, no acks received
2022-03-23T16:35:20.056Z [INFO] agent.server.memberlist.wan: memberlist: Marking ceph-2.dc1 as failed, suspect timeout reached (0 peer confirmations)
2022-03-23T16:35:20.056Z [INFO] agent.server.serf.wan: serf: EventMemberFailed: ceph-2.dc1 192.168.11.11
I guess dc1 is trying to contact dc2’s server with its k8s “private” IP (10.42.0.77
) which is obviously not accessible. When I list the members :
root@ceph-2:~ # consul members -wan
Node Address Status Type Build Protocol DC Partition Segment
ceph-1.dc1 192.168.11.10:8302 alive server 1.11.2 2 dc1 default <all>
ceph-2.dc1 192.168.11.11:8302 alive server 1.11.2 2 dc1 default <all>
ceph-3.dc1 192.168.11.12:8302 alive server 1.11.2 2 dc1 default <all>
consul-consul-server-0.dc2 10.42.0.77:8302 alive server 1.11.2 2 dc2 default <all>
On the dc1 server side, I have the following logs:
2022-03-23T17:12:38.483+0100 [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.42.0.77:8302: read tcp 192.168.11.10:42026->192.168.11.10:30555: read: connection reset by peer
2022-03-23T17:12:38.787+0100 [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.42.0.77:8300 datacenter=dc2 method=Internal.ServiceDump error="rpc error getting client: failed to get conn: read tcp 192.168.11.10:54879->192.168.11.10:30555: read: connection reset by peer"
2022-03-23T17:12:38.817+0100 [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.42.0.77:8300 datacenter=dc2 method=Internal.ServiceDump error="rpc error getting client: failed to get conn: read tcp 192.168.11.10:34443->192.168.11.10:30555: read: connection reset by peer"
2022-03-23T17:12:40.978+0100 [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.42.0.77:8300 datacenter=dc2 method=Internal.ServiceDump error="rpc error getting client: failed to get conn: read tcp 192.168.11.10:35321->192.168.11.10:30555: read: connection reset by peer"
I have no idea why dc1 does not use the ip 192.168.11.20
Thanks for your help