Consul client not able to talk to consul Server

trying to make consul agent on one vm to talk to consul server on another vm. I am getting error on the consul agent as shown below

  2020-12-15T17:38:18.522Z [INFO]  agent.client.serf.lan: serf: EventMemberLeave: scsadpcbackendvm000000 10.90.2.4
    2020-12-15T17:38:21.394Z [WARN]  agent.router.manager: No servers available
    2020-12-15T17:38:21.394Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"

I can ping from agent to server, both are on the same subnet.
any debugs I can start on the client ?

Can you share your configs and the way you start the agents?

Yes, you can set log_level to debug, or even trace, IIRC. But, yes, as @Wolfsrudel says, we really need to see your configs if we’re going to provide useful suggestions.

advertise_addr = "${local_ipv4}"
retry_join = ["provider=azure tag_name=Env tag_value=consul tenant_id=... client_id=xxxxxxx subscription_id=.. secret_access_key=..."]
EOF

I start the consul agent

sudo consul agent -config-dir=/etc/consul.d/

Server config

cat << EOF > /etc/consul.d/consul.hcl
datacenter = "dc1"
data_dir = "/opt/consul"

ui = true
EOF

cat << EOF > /etc/consul.d/server.hcl
server = true
bootstrap_expect = 1
client_addr = "0.0.0.0"
advertise_addr = "10.90.2.100"
retry_join = ["provider=azure tag_name=Env tag_value=consul tenant_id=.. client_id=.. subscription_id=.. secret_access_key=.."]
EOF

I have the exact same issue with a newly installed consul on Ubuntu 22.04.2 (single node):
/etc/consul.d/consul.hcl:

data_dir = "/opt/consul"
bind_addr = "10.211.55.36" # Listen on all IPv4
bootstrap_expect=0
 
log_level = "TRACE"

I simply want to start it to test the bootstrapping, but I can’t get past it:

[ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"

I’m using consul 1.15.1.

I’ve also tried without ‘bootstrap_expect’. I’ve heard this this might be related to the ip consul is binding too. I needed to bind it explicitly to an IP, because consul complains.

Funnily enough a 3-node cluster with acl and mTLS works without any issue using the same version :slight_smile:

Never mind, I added server = true and it’s got past that. I have a different error now, but this is for a different thread :slight_smile: