Connecting to a private datacenter over a wan from behind my router

My apologizes. Have tried dozens of permutations -bind, -client, -advertise, -advertise-wan and translate_wan_addrs:true in the systemd/system/consul.service file and issued afterward or in the consul.hcl file : consul retry-join, consul join, consul join -wan. Both on the joining server and the already joined servers. Looking for an exact roadmap if someone knows.

I have a 5 server aws in VPC consul server group (to function as a backing store for local host vaults) ( the internal ip as the retry join list running fine )( -advertised-wan at least one of the servers external ip)

I have my local workstation with the same versions of consul and its NAT ip is of course not the same as the assigned registered external ip to my router. I ALWAYS clear on the data directory before each test. Test is “service consul start” and monitor consul members and consul info.

On several occasions I have see all the entire list of all internal ips of the aws servers on the consul members reportback on my local machine and it of course has the sync as failed (even those it was able to get from from the single joining machine the cluster members - was pleasantly surprised to see that). In most cases I see the leader on the cluster start to stick( slow reporting on consul info and consul members) on reportback and list my local machine as not in sync, but usually my machine just does not show anything other than itself in the consul member listing

So, please … to connect a local machine behind a standard NAT router to connected through the public IPs of a 5 or 6 machine established internal ip aws EC2 consul cluster… what is the exact configuration of:

a) the /etc/consul.d/consul.hcl
b) the /etc/consul.d/server.hcl
c) the /etc/systemd/system/consul.service

files.

Consul version 1.73
Ubuntu 18.04 on all machines
All configurations list the same datacenter name
All configurations list the same encryption key
All configurations list performance as Raft Mult = 1 (for wan connected split of a single data center, do I have to change that to make it connect with great wan latency?)
All server have identically named data_dir.
All server are set as servers (in server file)
All have at least one more than itself the bootstrap-expect
All ports in security groups and in UFW fierwalls open in both UDPand TCP for 8300, 8301, 8302, 86000

After three days on this and reading everything I can find, I am running out of combinations. Is there a bible somewhere that this works every time?

Any pointers really really really appreciated. Thank you

Thanks for reporting!

From your description I don’t understand if you are trying to join the WAN or the LAN pool. Joining the LAN pool would be a problem since Consul LAN gossip is configured in such a way that it expects a certain latency which wouldn’t be guaranteed in your setup.

Indepently from the WAN or LAN question, Consul gossip uses UDP which doesn’t play well with NAT in most cases.

You could try to establish a VPN connection and try if that helps.

Thanks,
Hans

Thanks for the response. Seriously thank you.

Assuming UDP through the router is not the issue, on the question of assumed latency, would changing the raft multipler to a higher number than one make that timing requirement more forgiving?

All this is, is a shared storage platform for mostly static vault secrets accessed through vault via local host from Node.js apps. Its not a highly dynamic data system.

Thoughts? Thanks again.

The raft multiplier adjusts raft parameters. Raft is the underlying consensus protocol, in which only servers are participating. If I understand correctly, you want your machine to join as a Consul client and in that case the raft multiplier doesn’t have any effect.

have been setting up as a remote server(part of the cluster) not a client. does that matter?

Yes, that complicates matters. Not only needs gossip LAN to work but also raft consensus with a much higher latency as expected inside a single datacenter.

You start your server in another datacenter, and wan-join the private datacenter. That would avoid the raft problems because it is scoped to each datacenter. And it would use the WAN gossip with expectations for latencies adjusted to work better with WAN.

But let me reiterate this: this is by no means a supported use case for consul and nothing we recommend doing.

Thanks,
Hans