Connect sidecar listening healthcheck fail

martinkingtw · September 23, 2020, 10:07am

Hi everyone!

I’m using Nomad with Consul Connect to deploy dockers. However, every job I ran, its connect sidecar listening healthcheck is failing.

I tried to go back to the tutorials, but the problem persisted.

Version:
Nomad v0.12.4
Consul v1.8.4

# consul-config-file
acl = {
  enabled = true
  default_policy = "deny"
  enable_token_persistence = true
}

connect {
  enabled = true
}

# nomad-config-file
consul {
  token = "<Nomad Demo Agent Token>"
}

I first ran consul according to this tutorial.

Then, I ran nomad according to this tutorial.

I ran it identical to the tutorials, but when I ran nomad run countdash.nomad, the problem surfaced.

I’m wondering if there is any step missed in the two tutorials, or if anyone else has the same problem. Thanks!

CarelvanHeerden · September 23, 2020, 8:59pm

Hi

In your health check, try and add address_mode = “driver” and expose = true

Example:

      check {
    type = "http"
    path = "/health"
    interval = "10s"
    timeout = "2s"
    address_mode = "driver"
    expose = true
    check_restart {
      limit = 3
      grace = "120s"
      ignore_warnings = false
    }
  }

martinkingtw · September 24, 2020, 1:48am

Hi! @CarelvanHeerden. Thanks for the reply, but I didn’t define the health check. The “Connect Sidecar Listening” health check is defined automatically with sidecar_service stanza.

After some debugging, I found out that it’s because I added host_network stanza in my consul-config-file.

#consul-config-file
consul {
  token = "<Nomad Demo Agent Token>"
}

client {
    host_network "myNetwork" {
        cidr = "xxx.xx.xxx.x/24"
    }
}

I thought it wasn’t important, so I omitted it in my previous post.
So if I define my own network, then the sidecar health check will fail? But the task isn’t even in that network. Is this a bug? Thanks in advance.

jsanant · December 24, 2020, 12:01pm

@martinkingtw - I haven’t defined host_network in any of my clients but the Connect Sidecar Listening the health check is still failing. Did you find a fix for it?

BeepDog · January 17, 2021, 11:59pm

I have substantially newer versions of everything, and yet the same problem. I’m not really sure how to diagnose it.

I don’t have any host_network defined. I’ve simply been trying to follow the tutorial here: Consul Connect | Nomad by HashiCorp

My environment:

root@kain:~# nomad version
Nomad v1.0.2 (4c1d4fc6a5823ebc8c3e748daec7b4fda3f11037)
root@kain:~# consul version
Consul v1.9.1
Revision ca5c38943
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

My consul checks are failing:

I also get a warning, but I’m not sure if it’s a red herring, from nomad job plan

Scheduler dry-run:
- WARNING: Failed to place all allocations.
  Task Group "api" (failed to place 1 allocation):
    * Constraint "${attr.cpu.arch} = amd64": 1 nodes excluded by filter
    * Resources exhausted on 1 nodes
    * Dimension "network: no addresses available for \"\" network" exhausted on 1 nodes

This may simply be because I have one node excluded as it’s an arm64 node, and the demo containers don’t have arm64 images, so I set that constraint to only include amd64 hosts.

  constraint {
    attribute = "${attr.cpu.arch}"
    value = "amd64"
  }

martinkingtw · January 18, 2021, 8:03am

Hi! Sorry, once I removed host_network, the problem went away.

martinkingtw · January 18, 2021, 8:20am

Is the checking stanza expose = true used in your task, like @CarelvanHeerden said .

Also, use netstat to check whether the task is indeed listening to the port. When I had the problem, only the “Connect Sidecar Listening” is failing, but all the other checks are working fine.

Alternatively, it seems that one of your task is not being run, because there is no more resources left after excluding all the unusable nodes. Maybe this is that one task?

BeepDog · January 18, 2021, 5:13pm

Thanks for responding!

I did already have expose=true in the api health check. I did add address_mode = “driver”

The task sure seems like it’s running. That warning is weird to me, because I should be able to run more than just one task on a host. I don’t know how to debug that problem, or if it’s an actual problem.

Here’s the latest allocation for the api group:

BeepDog · January 18, 2021, 5:15pm

setting address_mode="driver" causes it to fail, as I don’t actually have a “driver” network

failed to setup alloc: pre-run hook “group_services” failed: error getting address for check “api-health”: cannot use address_mode=“driver”: no driver network exists

BeepDog · January 18, 2021, 8:33pm

@martinkingtw Turns out my problem was very simple. I forgot to enable the gRPC port.

once I did that, everything comes up.

I do still have the network warning, but I’ve started a different thread for that.

AndrewChubatiuk · February 11, 2021, 6:44pm

@martinkingtw
I’ve prepared a PR with a fix
You can test it locally

mshah071 · October 5, 2021, 3:25pm

Hello,

Did you enable grpc port on consul client or consul server? Is your nomad connecting to client or directly to server?

Topic		Replies	Views
Prevent port exposure with Nomad + Connect + Docker Nomad connect , health-check	0	410	August 12, 2023
Consul Connect jobs unable to talk to sidecar Nomad	3	8746	January 18, 2021
Consul connect with health checks Nomad	1	3153	April 12, 2020
Nomad + Consul Connect Nomad	2	1322	September 16, 2019
Consul health check error: Unexpected health check server error in nomad client Nomad health-check , consul-nomad	0	844	November 18, 2021

Connect sidecar listening healthcheck fail

Related topics