I have been trying to understand the nomad consul connect integration, but yet, here we are. Let me explain: I have a simple nomad job, created based on countdash but using nicholas jacksons excellent fakeservice instead. It looks like this:
job "fakedash" {
datacenters = ["dc1"]
group "upstream" {
network {
mode = "bridge"
}
service {
name = "upstream-api"
port = 5000
connect {
sidecar_service {}
}
}
task "api" {
driver = "docker"
config {
image = "10.0.0.2:5000/nicholasjackson/fake-service:v0.25.1"
}
env {
NAME = "Backend"
LISTEN_ADDR = "0.0.0.0:5000"
MESSAGE = "${NOMAD_ALLOC_ID}"
}
}
}
group "frontend" {
network {
mode = "bridge"
port "web" {
to = 9090
static = 9090
}
}
service {
name = "frontend-web"
port = "web"
check {
type = "http"
port = "web"
path = "health"
interval = "15s"
timeout = "1s"
}
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "upstream-api"
local_bind_port = 8080
}
}
}
}
}
task "web" {
driver = "docker"
config {
image = "10.0.0.2:5000/nicholasjackson/fake-service:v0.25.1"
}
env {
UPSTREAM_URIS = "http://${NOMAD_UPSTREAM_ADDR_upstream_api}"
NAME = "Frontend"
MESSAGE = "${NOMAD_ALLOC_ID}"
}
}
}
}
I also have an intention in consul, looking like this:
# cat /tmp/intention.hcl
Kind = "service-intentions"
Name = "upstream-api"
Sources = [
{
Name = "frontend-web"
Action = "allow"
}
]
Which has been added to consul. I have this job up and running, and from my worker node, I can curl them both and get responses on the port consul says they’re running on, but my frontend is not allowed to connect to the upstream api. This is the output from the frontend:
# curl 127.0.0.1:9090
{
"name": "Frontend",
"uri": "/",
"type": "HTTP",
"ip_addresses": [
"172.26.64.7"
],
"start_time": "2024-01-01T18:52:33.377389",
"end_time": "2024-01-01T18:52:33.378765",
"duration": "1.375753ms",
"body": "73a2986c-7416-cb09-7f82-edf53ffbc8c2",
"upstream_calls": {
"http://127.0.0.1:8080": {
"uri": "http://127.0.0.1:8080",
"headers": {
"Content-Length": "152",
"Content-Type": "text/plain",
"Date": "Mon, 01 Jan 2024 18:52:33 GMT",
"Server": "envoy"
},
"code": 503,
"error": "Error processing upstream request: http://127.0.0.1:8080/, expected code 200, got 503"
}
},
"code": 500
}
So, obviously something is wrong. If I curl to the sidecar proxy of the upstream-api from the machine, I get this:
]# curl -k --cert /opt/nomad/tls/cert.pem --key /opt/nomad/tls/key.pem https://127.0.0.1:21292
RBAC: access denied
Which is, I assume, what the service mesh should say to an unauthorized connection. The same message when connecting to the sidecar for the frontend. So everything seems to be working as far as I understand how to investigate.
If I go into the web container and run curl to the upstream, I get the following instead:
# curl 127.0.0.1:8080
upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed connect error: 111
So, what do I investigate next? The consul part seems like the likely problem, but also like a black box right now, are there things I can do to pry that open and figure out what’s inside? I have been looking around in the stderr logs, but to my eyes, there is a bunch of envoy stuff there but nothing that looks remotely like a connection error. Am I looking in the wrong place? Is there a way to get these guys up and running with debug output? Other suggestions?