Proxy sidecar unexpectedly advertises localhost via dns

I am trying to setup a basic nomad consul integration. My limited understanding of what Consul provides in that scenario is that Nomad in production should run the workloads on localhost, and then the proxy sidecars will do the communication over the public network? My expectation would be that service A on host A will talk to it’s local proxy, which will ask consul for the machine service B runs on, get back the public IP for host B, and when talking to the proxy on machine B, that traffic will be routed through envoy on host B to service B on that machine?

This internal model seems to correlate with what I am seeing when running Nomad on my public network, where my services can connect to one another, and I can control whether they are allowed to communicate with a consul intention, which seems to eliminate them talking directly to one another.

When I change the nomad interface to lo, though, my frontend service can no longer talk to my backend, it gets an envoy 111 error, which after some googling seems to indicate there is nothing running on the ip:port that is trying to be reached. When doing some basic DNS queries to consul, I get back the bind address inside of nomad, which in the case of 127.0.0.1 would, is my speculation, mean that service A connects to the proxy, which looks for what to talk to, finds 127.0.0.1 as the advertised address, talks to the internal 127.0.0.1 in the sidecar, which has no service running on it, and gets the 111 error.

This leads to a couple of questions: First, is this understanding of the behavior of consul service mesh correct? This question is mostly for my edification, but secondly, the more important question: Should the consul query for my backend service provide the public IP rather than 127.0.0.1? And if so, what do I do in order to make sure that it advertises correctly even when running the workload itself on localhost? If my guess is wrong and it should advertise localhost, what is the lookup I should use in order to figure out which machine is running it?

Maybe my presuppositions are wrong, and running nomad on localhost isn’t applicable to a nomad + consul cluster? When reading the warning at the bottom of Networking | Nomad | HashiCorp Developer , my assumption is that we should bind to loopback always, but maybe that’s just the case when not wanting any kind of external access, including from the service mesh?

  1. I think, in a multi-host setup, services should register with an IP address that is reachable by other hosts, which typically would be the host’s private or public IP, not the localhost address.
  2. How to discover which machine is running a service? Consul’s DNS or HTTP API can be used to discover the address of a service. For example, DNS query for servicename.service.consul should return the IP addresses of all instances of that service
1 Like

I suspect you are right. But the warning in that case is confusing to me. What is the practical application of

Warning: To prevent any type of external access when using bridge network mode make sure to bind your workloads to the loopback interface only.

What are the kind of jobs we want to deploy that are unaccessible?

Your understanding of Consul service mesh seems spot-on. Regarding the issue, when running Nomad on localhost, it’s possible that DNS queries return 127.0.0.1, causing envoy errors. You might want to check to Download Slide Share Consul configuration for service registration. Ensure it advertises the correct IP, perhaps using the external IP when on localhost. Double-check DNS settings too. If issues persist, reaching out to HashiCorp support or the community might yield more insights. Happy coding!

@MWinther, it sounds like you’re trying to configure Consul or Nomad so that certain services, like the HTTP API, are only accessible over the loopback address. If so, you’ll need to configure the client_addr parameter in Consul to and address that different from the bind_addr.

See this recent thread for more info.

Well, possibly. My mental model was assuming that we would always want to bind our services to localhost, and that envoy in turn would route the service mesh requests over a routable address to localhost, in order to prevent any direct communication with services over the network. But it seems that might not be the desired model, but rather, we bind our services to a routable interface if we want them to be available through the service mesh?

That is the recommended way to configure applications that have been deployed into Consul service mesh. However, the address that is registered into the Consul catalog for the service and its sidecar needs to be routable from other nodes.

The -advertise CLI flag or advertise_addr configuration option specifies the address that the Consul client agent/node advertises to other nodes in the cluster. (This value defaults to the -bind address.) The other nodes use this address to gossip with the client agent, so this address must be routable from all other nodes in the cluster.

The advertise address is also the Address that is associated with the node when it is registered into Consul’s catalog. Services and their associated sidecar proxies that are registered to that node will inherit the node’s address unless a different address has been specified in the Address field of the service registration.

I’ll use a more concrete example config example to better explain.

  1. Consul client agent consul-client-1 has an advertise address of 192.0.2.10.
  2. Service foo is registered to the agent.
    1. The Port is set to 8080.
    2. The Address field is omitted from service registration payload.
    3. The service is configured as being Connect/mesh-enabled. E.g.,
      service = {
        name = "foo"
        port = 8080
        connect = { sidecar_service = {} }
      }
      
  3. The foo service is configured to bind only to the loopback address.

When the service is registered to the agent, the resultant service registration gets expanded into two separate service registrations that look something like this.

# foo-service.hcl
service = {
  name = "foo"
  address = "192.0.2.10"
  port = 8080
}
# foo-sidecar-service.hcl
service = {
  kind = "connect-proxy"
  name = "foo-sidecar-proxy"
  address = "192.0.2.10"

  # The dynamically port the service mesh proxy listens on.
  # This port range is controlled by the `sidecar_min_port` and  
  # `sidecar_max_port` configuration options.
  port = 21000 

  proxy {
    destination_service_name = "foo"
    # The address the proxy should use to reach the local application.
    # Defaults to 127.0.0.1.
    local_service_address = "127.0.0.1"

    # The port at which the proxy should use to reach the local application.
    # Defaults to the associated service's `port`.
    local_service_port = 8080
  }
}

DNS lookups for foo.service.consul, or an HTTP catalog query for the foo service will return 192.0.2.10 as the service’s address, along with 8080 as the application’s configured port. However, foo will not be directly reachable by downstream services on this address because it does not bind to a routable interface. The only way to reach foo is through its sidecar proxy.

DNS lookups for foo.connect.consul, or an HTTP catalog query for /v1/health/connect/foo service will return an address of 192.0.2.10 and port of 21000. This is the address and port used by downstream proxies to reach the foo service over the mesh.

When a downstream application attempts to connect to foo using the local port associated with that upstream service, the downstream’s proxy will connect to foo’s sidecar proxy at 192.0.2.1 port 21000. The upstream proxy will then forward the request to the foo application that is listening on 127.0.0.1 port 8080.

So, to summarize, this traffic flow does would not work because foo is not bound to this routable interface.

downstream app -> 192.0.2.10:8080 -> foo

But this traffic flow would work because the proxy is appropriately configured to forward the connection to foo on the loopback interface.

(downstream app) -> 127.0.0.1:1234 (upstream listener for foo) -> (downstream proxy) -> 192.0.2.10:21000 (upstream proxy) -> 127.0.0.1:8080 (local app)

I hope this helps. If something is still not clear, please let me know and I will try to better clarify.