External nodes and DNS

I have nodes external to Consul that Consul can’t run on, but I’d still like to reach them with the .node.consul DNS suffix. Is this possible, or is .node.consul reserved only to nodes actually running Consul either as an agent or as a server?

You will need to configure your external node to forward .consul requests to Consul Server port 8600.

More here:

That’s not what I want to do.
I want to access node “foo” by “foo.node.consul”, but the node “foo” can’t run Consul.
I basically want to register nodes like one would register services.

You can do this with the Consul API. In particular you want to use the Catalog API

If you also want the benefits of health checking those external nodes and services then you could always use consul-esm. Consul ESM is specifically for that type of situation and will register the nodes/services with Consul for things that you cannot run a full Consul agent on.

To use Consul ESM you will register services via the API with node metadata indicating that they are externally managed and potentially with health check definitions. Then when running Consul ESM it will find those external services and will perform the desired health checks defined within the service definition as well as a default node alive probing health check.

2 Likes

@mkeeler what would be best practice keep ESM on Consul Server, or create separate instances?

The consul agent in case of separate instances should run as client I suspect? What would be the least permissive ACL in this case?

Thank you

@vasilij-icabbi I had to edit my previous response after diving a bit deeper into Consul ESM. It performs health checking on the already registered external services and will not itself register them.

Normally its best to isolate the Consul servers and not run other agents/tools on the same instance.

For consul-esm there is no distinction between server and client. Its always a “client” similar to a Consul client agent.

As for the ACL privileges needed I believe they are the following:

  • node:write on all external node names
  • service:write on all the services on those external nodes that you want it to be able to update the health checks status for.
  • key:write on the consul-esm/ prefix. If you change the consul_kv_path configuration for consul-esm to something other than the default then it changes the prefix needed here.
  • session:write on the name of the agent that you point consul-esm at to use its HTTP API.

At the very least that should be close, although I haven’t tested it.

Also the External Services Guide would be good to look through for this use case.

1 Like

If I understand from here correctly, you still need Consul Agent running locally that will be used by Consul ESM daemon?

Just double checking if local agent is optional and I can point just to Consul Cluster HTTP Addr.

You need a Consul instance serving the HTTP API. Whether that is a Consul Server or Consul Client doesn’t matter to consul-esm

1 Like

So I’ve managed to use the catalog API to register the node and service, but when I try to add health checks they never get updated (it seems). Here’s my definition:

{
    "Node": "controller",
    "Address": "10.0.0.100",
    "Datacenter": "home",

    "Service": {
        "ID": "controller",
        "Service": "controller",
        "Address": "10.0.0.100",
        "Port": 8443
    },

    "Checks": [
        {
            "Node": "controller",
            "CheckID": "node:controller",
            "Name": "Node is alive",

            "Definition": {
                "TCP": "10.0.0.100:22",
                "Interval": "15s",
                "Timeout": "10s"
            }
        },
        {
            "Node": "controller",
            "CheckID": "service:controller",
            "ServiceID": "controller",
            "Name": "Service is up",

            "Definition": {
                "HTTP": "https://10.0.0.100:8443",
                "Interval": "15s",
                "Timeout": "10s",
                "TLSSkipVerify": true
            }
        }
    ]
}

I can’t tell why they’re failing, there doesn’t seem to be anything in the CLI that lets me monitor this, and the web UI just shows blank boxes:
image

Also been trying to get ESM to work, but I can’t even manage to start it. Logs:

systemd[1]: Starting Consul ESM...
consul-esm[570]: 2019/08/07 01:27:24 [INFO] Connecting to Consul on 127.0.0.1:8500...
consul-esm[570]: Consul ESM running!
consul-esm[570]:             Datacenter: "home"
consul-esm[570]:                Service: "consul-esm"
consul-esm[570]:            Service Tag: ""
consul-esm[570]:             Service ID: "consul-esm:80ea96ae-dae2-13e1-6c31-da3a763aba12"
consul-esm[570]: Node Reconnect Timeout: "72h0m0s"
consul-esm[570]: Log data will now stream in as it occurs:
consul-esm[570]: 2019/08/07 01:27:24 [INFO] Trying to obtain leadership...
consul-esm[570]: 2019/08/07 01:27:24 [INFO] Obtained leadership
consul-esm[570]: 2019/08/07 01:27:25 [INFO] Rebalanced 0 external nodes across 1 ESM instances
systemd[1]: consul-esm.service: Start operation timed out. Terminating.
consul-esm[570]: 2019/08/07 01:28:54 [INFO] Caught signal: terminated
consul-esm[570]: 2019/08/07 01:28:54 [INFO] Shutting down...
consul-esm[570]: 2019/08/07 01:28:54 [INFO] Caught signal: continued
consul-esm[570]: 2019/08/07 01:28:54 [WARN] Error querying for health check info: Get http://127.0.0.1:8500/v1/health/state/any?dc=home&index=34&node-meta=external-node%3Atrue: context canceled
consul-esm[570]: 2019/08/07 01:28:54 [WARN] Error querying for node watch list: Get http://127.0.0.1:8500/v1/kv/consul-esm/agents/consul-esm:80ea96ae-dae2-13e1-6c31-da3a763aba12?dc=home&index=37: context canceled
consul-esm[570]: 2019/08/07 01:28:55 [WARN] Error querying for health check info: Get http://127.0.0.1:8500/v1/health/service/consul-esm?dc=home&index=34&passing=1: context canceled
consul-esm[570]: 2019/08/07 01:28:55 [WARN] Error getting external node list: Get http://127.0.0.1:8500/v1/catalog/nodes?dc=home&index=10&node-meta=external-node%3Atrue: context canceled
systemd[1]: consul-esm.service: Failed with result 'timeout'.
systemd[1]: Failed to start Consul ESM.

This was with Consul 1.5.3 and ESM 0.3.3.

So I found a couple of issues on my end, but I also blame lack of documentation.
I tried to run this as a non-root systemd service, so when ESM tried to probe with UDP it was denied. Changing the ping_type option to “socket” it also failed, because for whatever reason ESM needs access to raw sockets instead of stream, so I had to give the binary that capability.

Here’s a working systemd service definition for whoever comes next:

[Unit]
Description=Consul ESM
Documentation=https://github.com/hashicorp/consul-esm

# wait for network stack
Requires=network-online.target
After=network-online.target

[Service]
User=consul
Group=consul
ExecStart=/usr/local/bin/consul-esm -config-dir /etc/consul-esm.d/
KillMode=process
Restart=on-failure
RestartSec=2

# give ESM the right to open ICMP sockets
PermissionsStartOnly=true
ExecStartPre=/sbin/setcap 'cap_net_raw=+ep' /usr/local/bin/consul-esm

[Install]
WantedBy=multi-user.target

Make sure to set the option ping_type = "socket" and create the user and group (or just reuse one used by consul itself).

Accompanying JSON body for registering the node and service with the catalog API and let ESM handle it (the NodeMeta values):

{
    "Node": "controller",
    "Address": "10.0.0.100",

    "NodeMeta": {
        "external-node": "true",
        "external-probe": "true"
    },

    "Service": {
        "ID": "controller",
        "Service": "controller",
        "Address": "10.0.0.100",
        "Port": 8443,
    },

    "Check": {
        "Node": "controller",
        "CheckID": "service:controller",
        "ServiceID": "controller",
        "Name": "Service is up",

        "Definition": {
            "HTTP": "https://10.0.0.100:8443",
            "Interval": "30s",
            "Timeout": "10s",
            "TLSSkipVerify": true
        }
    }
}
1 Like

@p3lim So did your service registration with consul using /catalog/register endpoint made that service use the consul server as a DNS server. ? Or is the registration strictly for health check?