Connect sidecar listening healthcheck fail

Hi everyone!

I’m using Nomad with Consul Connect to deploy dockers. However, every job I ran, its connect sidecar listening healthcheck is failing.

I tried to go back to the tutorials, but the problem persisted.

Version:
Nomad v0.12.4
Consul v1.8.4
# consul-config-file
acl = {
  enabled = true
  default_policy = "deny"
  enable_token_persistence = true
}

connect {
  enabled = true
}
# nomad-config-file
consul {
  token = "<Nomad Demo Agent Token>"
}

I first ran consul according to this tutorial.

Then, I ran nomad according to this tutorial.

I ran it identical to the tutorials, but when I ran nomad run countdash.nomad, the problem surfaced.

I’m wondering if there is any step missed in the two tutorials, or if anyone else has the same problem. Thanks!

Hi

In your health check, try and add address_mode = “driver” and expose = true

Example:

      check {
    type = "http"
    path = "/health"
    interval = "10s"
    timeout = "2s"
    address_mode = "driver"
    expose = true
    check_restart {
      limit = 3
      grace = "120s"
      ignore_warnings = false
    }
  }

Hi! @CarelvanHeerden. Thanks for the reply, but I didn’t define the health check. The “Connect Sidecar Listening” health check is defined automatically with sidecar_service stanza.

After some debugging, I found out that it’s because I added host_network stanza in my consul-config-file.

#consul-config-file
consul {
  token = "<Nomad Demo Agent Token>"
}

client {
    host_network "myNetwork" {
        cidr = "xxx.xx.xxx.x/24"
    }
}

I thought it wasn’t important, so I omitted it in my previous post.
So if I define my own network, then the sidecar health check will fail? But the task isn’t even in that network. Is this a bug? Thanks in advance.

@martinkingtw - I haven’t defined host_network in any of my clients but the Connect Sidecar Listening the health check is still failing. Did you find a fix for it?

I have substantially newer versions of everything, and yet the same problem. I’m not really sure how to diagnose it.

I don’t have any host_network defined. I’ve simply been trying to follow the tutorial here: Consul Connect | Nomad by HashiCorp

My environment:

root@kain:~# nomad version
Nomad v1.0.2 (4c1d4fc6a5823ebc8c3e748daec7b4fda3f11037)
root@kain:~# consul version
Consul v1.9.1
Revision ca5c38943
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

My consul checks are failing:

I also get a warning, but I’m not sure if it’s a red herring, from nomad job plan

Scheduler dry-run:
- WARNING: Failed to place all allocations.
  Task Group "api" (failed to place 1 allocation):
    * Constraint "${attr.cpu.arch} = amd64": 1 nodes excluded by filter
    * Resources exhausted on 1 nodes
    * Dimension "network: no addresses available for \"\" network" exhausted on 1 nodes

This may simply be because I have one node excluded as it’s an arm64 node, and the demo containers don’t have arm64 images, so I set that constraint to only include amd64 hosts.

  constraint {
    attribute = "${attr.cpu.arch}"
    value = "amd64"
  }

Hi! Sorry, once I removed host_network, the problem went away.

Is the checking stanza expose = true used in your task, like @CarelvanHeerden said .

Also, use netstat to check whether the task is indeed listening to the port. When I had the problem, only the “Connect Sidecar Listening” is failing, but all the other checks are working fine.

Alternatively, it seems that one of your task is not being run, because there is no more resources left after excluding all the unusable nodes. Maybe this is that one task?

Thanks for responding!

I did already have expose=true in the api health check. I did add address_mode = “driver”

The task sure seems like it’s running. That warning is weird to me, because I should be able to run more than just one task on a host. I don’t know how to debug that problem, or if it’s an actual problem.

Here’s the latest allocation for the api group:

setting address_mode="driver" causes it to fail, as I don’t actually have a “driver” network

failed to setup alloc: pre-run hook “group_services” failed: error getting address for check “api-health”: cannot use address_mode=“driver”: no driver network exists

@martinkingtw Turns out my problem was very simple. I forgot to enable the gRPC port.

once I did that, everything comes up.

I do still have the network warning, but I’ve started a different thread for that.

@martinkingtw
I’ve prepared a PR with a fix
You can test it locally

Hello,

Did you enable grpc port on consul client or consul server? Is your nomad connecting to client or directly to server?