Failed to start Consul client

I am trying to install consul on one of my servers (I have ten others already) via ansible using the Consul Ansible role, but I get the following error:

  TASK [idealista.consul-role : Consul | Create ACL] ***************************************************************************************
failed: [php7d] (item={'key': 'Agent', 'value': {'token_type': 'client', 'token': 'xxxxxxxxxxx', 'rules': [{'node': '', 'policy': 'write'}, {'service': '', 'policy': 'read'}]}}) => {"ansible_loop_var": "item", "changed": false, "item": {"key": "Agent", "value": {"rules": [{"node": "", "policy": "write"}, {"policy": "read", "service": ""}], "token": "xxxxxxxxxxx", "token_type": "client"}}, "msg": "Could not connect to consul agent at xxx.xx.x.xx:xxxx, error was HTTPConnectionPool(host='xx.xx.x.x', port=8500): Max retries exceeded with url: /v1/acl/list?token=******** (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fcb7ead26d0>: Failed to establish a new connection: [Errno 110] Connection timed out',))"}

I went ahead and compared the consul.json file to one of my other servers and everything matches outside of the new IP.

    {
  "node_name": "php7d",
  "ui": true,
  "addresses":{
    "http": "0.0.0.0"
  },
  "data_dir": "/opt/consul/data",
  "log_level": "INFO",
  "domain": "consul",
  "datacenter": "xxxxx",
  "bind_addr": "xxx.xx.x.xx",
  "advertise_addr": "xxx.xx.x.xx",
  "ports": {
    "http": 8500,
    "dns": 8600
  },
  "server": false,
  "enable_syslog": false,
  "retry_join": ["xxx.xx.x.xx", "xxx.xx.x.xx", "xxx.xx.x.xx", "xxx.xx.x.xx", "xxx.xx.x.xx"],
  "rejoin_after_leave": true,
  "start_join": ["xxx.xx.x.xx", "xxx.xx.x.xx", "xxx.xx.x.xx", "xxx.xx.x.xx", "xxx.xx.x.xx"]
}

Since it is failing setting up via ansible, I tried to run the following manually consul agent -config-dir=/opt/consul/consul.d but I go this back:

==> Starting Consul agent...
==> Error starting agent: Failed to start Consul client: Failed to start lan serf: Failed to create memberlist: Could not set up network transport: failed to obtain an address: Failed to start TCP listener on "xx.xxx.xxx.xxx" port xxxx: listen tcp xx.xxx.xxx.xxx:xxxx: bind: address already in use

Would anyone have any recommendations on how to fix this?

Hi @albertski,

I saw your post on StackOverflow (https://stackoverflow.com/questions/62961945/could-not-connect-to-consul-agent-error-was-httpconnectionpool) and already commented there. Iā€™ll share my reply here as well.

From the output it looks like Ansible canā€™t reach the Consul API. Have you verified that the HTTP API on the Consul agent is in fact listening on the IP and port that Ansible is trying to reach?

You can can check this using netstat -an -f inet -p tcp | grep '8500' (use 8501 for HTTPS).

Thank you @blake for your help.

When I run netstat -an -f inet -p tcp | grep '8500', I get the following error:

netstat: feature `AF BLUETOOTH' not supported.
Please recompile `net-tools' with newer kernel source or full configuration.

Iā€™m able to run the command locally on my Mac but on the server (Ubuntu 18.04.3 LTS) I get that error. Still investigating how to get it to run.

@blake, without the -f argument it runs:

    netstat -an inet -p tcp | grep '8500'
    (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.)
    tcp6       0      0 :::8500                 :::*                    LISTEN      18675/consul

Does this seem right to you?

@albertski, makes sense. I forgot that -f is the address family argument on BSD systems (i.e., macOS). Linux uses -A.

Anyhow, it looks like Consul is only listening on IPv6, not IPv4. Try using the -bind option to force it to use IPv4.

āÆ consul agent -dev -bind "0.0.0.0"
==> Starting Consul agent...
           Version: 'v1.8.0'
           Node ID: '2e0c5bcf-8861-9363-c6a8-45574b8bd96d'
         Node name: 'TheB.local'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: false)
       Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
      Cluster Addr: 10.0.0.83 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

The equivalent option in the configuration file would be bind_addr.

Should I replace ā€œ0.0.0.0ā€ with my IP? I replaced it with my IP but Iā€™m getting the same error:

    consul agent -dev -bind "xx.xxx.xxx.xxx"
==> Starting Consul agent...
==> Error starting agent: Failed to start Consul server: Failed to start LAN Serf: Failed to create memberlist: Could not set up network transport: failed to obtain an address: Failed to start TCP listener on "xx.xxx.xxx.xxx0" port 8301: listen tcp xx.xxx.xxx.xxx:8301: bind: address already in use

Weird that it is showing port 8301 (also for consul agent -config-dir=/opt/consul/consul.d ), but the consul.json shows 8500.

ā€œ0.0.0.0ā€ means ā€˜listen on all interfaces/ ip addressesā€™ . Including yours. Try this.

Port 8301 is for LAN Serf (building cluster memberlist). 8500 from the json configuration file is for the http api.

I ran consul agent -dev -bind "0.0.0.0" and I got back:

==> Multiple private IPv4 addresses found. Please configure one with 'bind' and/or 'advertise'.

When I run consul kv get my/path I get back (verified it works in other server):

Error querying Consul agent: Unexpected response code: 500

Also, consul members doesnā€™t return anything.

OK, this is a hard one. Than back to binding to your ip. The error

bind: address already in use

tells us, that something else is already listening on this port. Is another agent still running in port 8301?

Alternatively try

consul agent -dev -bind 0.0.0.0 -advertise <your-ip> 

This is the output of checking 8301:

$ netstat -an inet -p tcp | grep '8301'
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 CURRENT.SERVER.IP:8301     0.0.0.0:*                  LISTEN      27377/consul
tcp        0      1 CURRENT.SERVER.IP:38160    ANOTHER.SERVER.IP:8301     SYN_SENT    27377/consul
udp        0      0 CURRENT.SERVER.IP:8301     0.0.0.0:*                              27377/consul

This is the same command on another server that doesnā€™t have any issues:

$ netstat -an inet -p tcp | grep '8301'
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 THAT.SERVER.IP:8301    0.0.0.0:*               LISTEN      1600/consul
tcp        0      0 THAT.SERVER.IP:8301    ANOTHER.SERVER.IP:58162     TIME_WAIT   -
tcp        0      0 THAT.SERVER.IP:59404   ANOTHER.SERVER.IP2:8301     TIME_WAIT   -
tcp        0      0 THAT.SERVER.IP:8301    ANOTHER.SERVER.IP2:32810    TIME_WAIT   -
udp        0      0 THAT.SERVER.IP:8301    0.0.0.0:*                           1600/consul

Here is the output of advertise.

consul agent -dev -bind 0.0.0.0 -advertise xx.xxx.xxx.xxx
==> Starting Consul agent...
==> Error starting agent: Failed to start Consul server: Failed to start LAN Serf: Failed to create memberlist: Could not set up network transport: failed to obtain an address: Failed to start TCP listener on "0.0.0.0" port 8301: listen tcp 0.0.0.0:8301: bind: address already in use

Thanks for your help @blake and @Wolfsrudel. I was able to solve my problem. I started to compare my new droplet vs my old droplet (I have this up on Digital Ocean) and I noticed that there were some extra tags on my old droplet: web php74. Once I added those tags in, the issues went away. Iā€™m guessing the ansible script to setup the servers somehow needs those tags.

2 Likes