Hi @fmp88,
From your DC names and the public IP, I guessed that you are running on DigitalOcean and tested the same and ended up having the same issue as yours.
I figured out that it is due to how the droplet networking is set up. Consul is trying to reach the primary Mesh Gateway via the Private Interface on the VM, which doesn’t work.
You can test the same behaviour by using curl to force the traffic via a specific interface:
curl -I eth0 www.google.com <== this works
curl -I eth1 www.google.com <== this doesn't
The fix is to add the following iptables rules so that Consul is able to talk to the primary mesh gateway and trigger initial replication.
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
Once the above rule is added, you will see that the replication succeeds. After this, you can go ahead and start the secondary Mesh Gateway, and all the requests will now start to flow through the mesh gateways.
Once this is done, you can remove (iptables -F -t nat
) the POSTROUTING rule and everything will continue to work.
root@ubuntu-s-1vcpu-1gb-lon1-01:~# curl https://localhost:8501/v1/acl/replication?pretty -k
{
"Enabled": true,
"Running": true,
"SourceDatacenter": "fra1",
"ReplicationType": "tokens",
"ReplicatedIndex": 62,
"ReplicatedRoleIndex": 1,
"ReplicatedTokenIndex": 334,
"LastSuccess": "2023-08-16T02:43:33Z",
"LastError": "2023-08-16T02:42:55Z",
"LastErrorMessage": "failed to retrieve remote ACL tokens: rpc error getting client: failed to get conn: Remote DC has no server currently reachable"
}
root@ubuntu-s-1vcpu-1gb-lon1-01:~# consul members -wan
Node Address Status Type Build Protocol DC Partition Segment
server-01.fra1 10.19.0.5:8302 alive server 1.16.1 2 fra1 default <all>
server-01.lon1 10.16.0.5:8302 alive server 1.16.1 2 lon1 default <all>
# Testing a cross-dc query
root@ubuntu-s-1vcpu-1gb-lon1-01:~# consul catalog services -datacenter fra1
consul
mesh-gateway
Logs from secondary:
2023-08-16T02:58:48.552Z [DEBUG] agent: Node info in sync
2023-08-16T02:58:48.552Z [DEBUG] agent: Service in sync: service=mesh-gateway
2023-08-16T02:58:48.552Z [DEBUG] agent: Check in sync: check=service:mesh-gateway
2023-08-16T02:58:49.473Z [DEBUG] agent.server.memberlist.wan: memberlist: Stream connection from=10.16.0.5:36004
2023-08-16T02:58:50.737Z [DEBUG] agent: Check status updated: check=service:mesh-gateway status=passing
2023-08-16T02:59:00.563Z [DEBUG] agent.server.replication.acl.token: finished fetching acls: amount=7
2023-08-16T02:59:00.563Z [DEBUG] agent.server.replication.acl.token: acl replication: local=7 remote=7
2023-08-16T02:59:00.564Z [DEBUG] agent.server.replication.acl.token: acl replication: deletions=0 updates=0
2023-08-16T02:59:00.564Z [DEBUG] agent.server.replication.acl.token: ACL replication completed through remote index: index=334
2023-08-16T02:59:00.738Z [DEBUG] agent: Check status updated: check=service:mesh-gateway status=passing
2023-08-16T02:59:10.740Z [DEBUG] agent: Check status updated: check=service:mesh-gateway status=passing
I hope this helps.