Nomad Service Discovery on MacOS returns loopback IP

aleneum · April 23, 2023, 9:48pm

I am trying to follow the Nomad Tutorial [1] but cannot get the referenced starter project from GitHub [2] to work. My problem is that Nomad’s service discovery will resolve {{ .Address }} (in for instance pytechco-web.nomad.hcl [7]) to a loopback address (127.0.0.1) which cannot be used by the web service to reach redis. The GitHub project already contains an issue concerning this [3] and suggests to replace {{ .Address }} with a reference to host.docker.internal. Is there a more flexible solution which allows to set the IP address via nomad service discovery independently of the OS? I have read that the used docker driver supports bridge mode only on Linux [4]. Is bridge mode a requirement for working IP discovery? I also had a look at consul [5] but consul service discovery also returns 127.0.0.1 for redis-svc.service.consul. The mentioned configuration in the FAQ [6] does not work for me with en0 and my public (LAN) IP. When I try to add the redis service I get a bind: cannot assign requested address" error.

So is there a dns/dynamic IP assigning method that works cross platform or is there something I can do to make nomad service discovery work reliably on MacOS?

[1] Create a Cluster | Nomad | HashiCorp Developer
[2] GitHub - hashicorp-education/learn-nomad-getting-started: Companion repo for the HashiCorp Nomad Get Started collection of tutorials
[3] cannot connect into redis through localhost connection · Issue #3 · hashicorp-education/learn-nomad-getting-started · GitHub
[4] Drivers: Docker | Nomad | HashiCorp Developer
[5] Understanding Networking in Nomad
[6] Frequently Asked Questions | Nomad | HashiCorp Developer
[7] learn-nomad-getting-started/pytechco-web.nomad.hcl at main · hashicorp-education/learn-nomad-getting-started · GitHub

tonino · April 24, 2023, 3:57pm

Hey @aleneum, thanks for the message and sorry to hear of the issue you’re facing.

So I understand correctly, the loopback address is coming from the web job when it’s querying the redis address, correct? When you run the redis job and inspect its allocation, what address does it use, the loopback (127.0.0.1)?
nomad alloc status <ALLOC-ID> | grep -A 3 -i allocation

Additionally, what is the advertised address of your client?
nomad node status -verbose <NODE-ID> | grep -i advertise

A quick test on my end shows the same internal address on the client node and the queried service: 192.168.50.210. I believe the service discovery feature uses this advertised address so just wanted to check if that might be the underlying issue.

Presumably you’re running a local cluster on your mac, are you passing any additional flags to the agent command? Or any custom configuration files?

aleneum · April 24, 2023, 4:41pm

Hello @tonino,

I’d say it’s what the nomad/consul service discovery is returning and consequently assigning to REDIS_HOST={{ .Address }} in pytechco-web.nomad.hcl. doggo redis-svc.service.consul @tcp://127.0.0.1:8600 also returns 127.0.0.1 when I tried a minimal consul setup. So it’s not necessarily related to the web job.

Just for the sake of completeness: I start nomad without any specific config:

nomad agent -dev --bind 0.0.0.0

and then run the redis job from the example repo ('s jobs folder):

nomad run pytechco-redis.nomad.hcl

afterwards, I retrieve ALLOC_ID of redis and NODE_ID of my nomad client from Nomad’s Web GUI and execute your suggested commands:

nomad alloc status 9699a3fc | grep -A 3 -i allocation 


Allocation Addresses:
Label   Dynamic  Address
*redis  yes      127.0.0.1:27769 -> 6379

jobs % nomad node status -verbose a8b4a766 | grep -i advertise
nomad.advertise.address   = 10.136.xxx.xxx:4646

10.136.xxx.xxx is the address of en4. I got three additional interfaces lo0, bridge100 (192.168.105.1) and en2 (10.132.xxx.xxx). One of enX is my wifi connection, the other one is a docking station ethernet connection. AND – and I guess this is at least a part of the issue — the underlying Colima VM (I don’t use Docker Desktop) has additional interfaces:
lo, eth1 (192.168.107.2), eth0 (192.168.5.15), docker0 (172.17.0.1).

Current workaround

In pytechco-web.nomad.hcl and all the other jobs that refer to redis, I changed:

{{ range nomadService "redis-svc" }}
REDIS_HOST={{ .Address }}
REDIS_PORT={{ .Port }}

to

{{ range nomadService "redis-svc" }}
REDIS_HOST=host.docker.internal
REDIS_PORT={{ .Port }}

host.docker.internal resolves to 192.168.5.2 in the Colima VM.

tonino · April 24, 2023, 5:48pm

Interesting setup, could you try to set the -network-interface flag of the agent command to one of your Colima VM interfaces since host.docker.internal resolves to one of them? Something like:
nomad agent nomad agent -dev --bind 0.0.0.0 -network-interface=eth0

aleneum · April 24, 2023, 8:06pm

I swear I sticked to the docs [1]
And this is of course not meant for production.

The interface is transparent for MacOS. However, Colima uses an Alpine under the hood. I logged into the Colima VM, installed nomad and run nomad from there:

colima start --cpu 4 --memory 8 --arch amd64 --network-address
colima ssh
sudo apk add nomad --repository=https://dl-cdn.alpinelinux.org/alpine/edge/community
nomad agent -dev --bind 0.0.0.0 -network-interface=col0

So where is col0 coming from? Usually Colima forwards all ports to MacOS which means that localhost:4646 works instantly. However, localhost:5000 will return an ‘access denied’ error because of the forwarded port (web server probably does not expect connections from the host machine’s IP). Colima introduced --network-address in 0.4.0 which will create the aforementioned col0 interface and renders the VM accessible via IP.

Thanks to the port forwarding, everything else can happen under MacOS now

nomad run pytechco-redis.nomad.hcl
nomad run pytechco-web.nomad.hcl
nomad node status -verbose $(nomad job allocs pytechco-web | grep  -i  running | awk '{print $2}') | grep -i ip-address | awk -F "=" '{print $2}' | xargs | awk '{print "http://"$1":5000"}' # http://192.168.106.2:5000

The returned address http://192.168.106.2:5000 is reachable from MacOS as well.

A bit off-topic and just for the record: I tried to use host_volume in this setting as well. A path in my home directory (e.g. /Users/<username>/nomad_volume) caused permission issues with the stateful doc example [2]. When I entered a system path /tmp/nomad_volume I got “directory not found” errors. Both are related to the fact that the actual work happens INSIDE the VM. So when I created /tmp/nomad_volume on MacOS, the VM — having a separate /tmp —folder could not access it. /tmp could be mounted of course but by default it is not. I ran into similar issues when using Podman in combination with WSL. I will probably just use a VM internal path since I don’t plan to access volumes from MacOS anyway.

Thanks @tonino for your support! Much appreciated!

[1] GitHub - abiosoft/colima: Container runtimes on macOS (and Linux) with minimal setup
[2] Stateful Workloads with Nomad Host Volumes | Nomad | HashiCorp Developer

tonino · April 24, 2023, 8:28pm

Glad to hear that you solved it!

Today I learned about Colima which sounds pretty cool, so thanks for that!