Hello,
I have created a cluster on AWS using https://github.com/hashicorp/nomad-autoscaler. The server and client nodes work fine, they can find each other. All those nodes are based on Ubuntu.
Now I have a Windows 2016 instance on AWS (in the same subnet of a Linux client), where I have installed nomad and consul. Nomad should join the servers thanks to Consul using auto join
as the Linux clients do, but it does not work in this Windows instance. Note that I’ve tagged the AWS instance with ConsulAutoJoin
= auto-join
already.
The HCL config file for Consul is:
datacenter = "dc1"
data_dir = "C:\\Consul\\data"
advertise_addr = "10.241.238.196"
bind_addr = "0.0.0.0"
client_addr = "0.0.0.0"
log_level = "INFO"
retry_join = ["provider=aws tag_key=ConsulAutoJoin tag_value=auto-join"]
ui = true
and the logs are:
PS C:\Users\Administrator> consul.exe agent -config-dir=C:\Consul\config\
==> Starting Consul agent...
Version: '1.8.4'
Node ID: 'cfc5d53c-1747-8945-ce84-50cdef8d40cd'
Node name: 'EC2AMAZ-IMF3L1P'
Datacenter: 'dc1' (Segment: '')
Server: false (Bootstrap: false)
Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
Cluster Addr: 10.241.238.196 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
==> Log data will now stream in as it occurs:
2020-10-01T10:38:14.559Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: EC2AMAZ-IMF3L1P 10.241.238.196
2020-10-01T10:38:14.657Z [INFO] agent.router: Initializing LAN area manager
2020-10-01T10:38:14.658Z [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=udp
2020-10-01T10:38:14.659Z [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=tcp
2020-10-01T10:38:14.659Z [INFO] agent: Started HTTP server: address=[::]:8500 network=tcp
2020-10-01T10:38:14.659Z [INFO] agent: started state syncer
==> Consul agent running!
2020-10-01T10:38:14.659Z [WARN] agent.router.manager: No servers available
2020-10-01T10:38:14.660Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2020-10-01T10:38:14.659Z [INFO] agent: Retry join is supported for the following discovery methods: cluster=LAN dis
covery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphe
re"
2020-10-01T10:38:14.660Z [INFO] agent: Joining cluster...: cluster=LAN
2020-10-01T10:38:14.660Z [INFO] agent: discover-aws: Address type is not supported. Valid values are {private_v4,p
ublic_v4,public_v6}. Falling back to 'private_v4': cluster=LAN
2020-10-01T10:38:14.660Z [INFO] agent: discover-aws: Region not provided. Looking up region in metadata...: cluster
=LAN
2020-10-01T10:38:43.098Z [WARN] agent.router.manager: No servers available
2020-10-01T10:38:43.098Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2020-10-01T10:38:55.439Z [ERROR] agent: Cannot discover address: cluster=LAN address="provider=aws tag_key=ConsulAutoJoin tag_value=auto-join" error="discover-aws: GetInstanceIdentityDocument fai
led: EC2MetadataRequestError: failed to get EC2 instance identity document
caused by: RequestError: send request failed
caused by: Get "http://169.254.169.254/latest/dynamic/instance-identity/document": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
2020-10-01T10:38:55.440Z [WARN] agent: Join cluster failed, will retry: cluster=LAN retry_interval=30s error="No servers to join"
2020-10-01T10:38:58.905Z [WARN] agent.router.manager: No servers available
2020-10-01T10:38:58.905Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2020-10-01T10:39:16.803Z [WARN] agent.router.manager: No servers available
2020-10-01T10:39:16.803Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2020-10-01T10:39:25.450Z [INFO] agent: discover-aws: Address type is not supported. Valid values are {private_v4,public_v4,public_v6}. Falling back to 'private_v4': cluster=LAN
2020-10-01T10:39:25.450Z [INFO] agent: discover-aws: Region not provided. Looking up region in metadata...: cluster=LAN
2020-10-01T10:39:33.389Z [WARN] agent.router.manager: No servers available
2020-10-01T10:39:33.389Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2020-10-01T10:39:55.108Z [WARN] agent.router.manager: No servers available
2020-10-01T10:39:55.108Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2020-10-01T10:40:06.213Z [ERROR] agent: Cannot discover address: cluster=LAN address="provider=aws tag_key=ConsulAutoJoin tag_value=auto-join" error="discover-aws: GetInstanceIdentityDocument fai
led: EC2MetadataRequestError: failed to get EC2 instance identity document
caused by: RequestError: send request failed
caused by: Get "http://169.254.169.254/latest/dynamic/instance-identity/document": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
2020-10-01T10:40:06.216Z [WARN] agent: Join cluster failed, will retry: cluster=LAN retry_interval=30s error="No servers to join"
2020-10-01T10:40:10.507Z [WARN] agent.router.manager: No servers available
2020-10-01T10:40:10.507Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2020-10-01T10:40:10.597Z [INFO] agent: Caught: signal=interrupt
2020-10-01T10:40:10.597Z [INFO] agent: Gracefully shutting down agent...
2020-10-01T10:40:10.598Z [INFO] agent.client: client starting leave
2020-10-01T10:40:10.599Z [INFO] agent.client.serf.lan: serf: EventMemberLeave: EC2AMAZ-IMF3L1P 10.241.238.196
2020-10-01T10:40:13.601Z [INFO] agent: Graceful exit completed
2020-10-01T10:40:13.601Z [INFO] agent: Requesting shutdown
2020-10-01T10:40:13.602Z [INFO] agent.client: shutting down client
2020-10-01T10:40:13.605Z [INFO] agent: consul client down
2020-10-01T10:40:13.605Z [INFO] agent: shutdown complete
2020-10-01T10:40:13.606Z [INFO] agent: Stopping server: protocol=DNS address=0.0.0.0:8600 network=tcp
2020-10-01T10:40:13.609Z [INFO] agent: Stopping server: protocol=DNS address=0.0.0.0:8600 network=udp
2020-10-01T10:40:13.609Z [INFO] agent: Stopping server: protocol=HTTP address=[::]:8500 network=tcp
2020-10-01T10:40:13.612Z [INFO] agent: Waiting for endpoints to shut down
2020-10-01T10:40:13.613Z [INFO] agent: Endpoints down
2020-10-01T10:40:13.613Z [INFO] agent: Exit code: code=0
I’m not sure whether there are issues with the Windows firewall or DNS settings.
Could you advise please?
Thanks
Marco