@DerekStrickland after a little more persistence and digging, I’ve got it working! This blog post helped me out a lot:
I think I did something along the lines of this… Anything you can use?
Nomad:
group {
# blah, blah..
network {
mode = "bridge"
port "metrics_envoy" {to = 9102}
service {
# blah, blah..
meta {
# Tag for prometheus scrape-targeting via consul (envoy)
metrics_port_envoy = "${NOMAD_HOST_PORT_metrics_envoy}"
}
connect {
sidecar_service {
proxy {
config {
# Expose metrics for prometheus (envoy)
envoy_prometheus_bind_addr = "0.0.0.0:9…
Also, I was running Consul 1.10.1, but the fix for the issue below was backported to 1.10 in version 1.10.2. After upgrading my client, I was able to successfully scrape Envoy metrics from Prometheus!
opened 09:42PM - 01 Aug 21 UTC
closed 12:10AM - 10 Aug 21 UTC
type/bug
good first issue
theme/consul-nomad
#### Overview of the Issue
Running an envoy proxy with `consul connect envoy`… and specifying the `-admin-bind` to an IP that is not `127.0.0.1` breaks prometheus metrics because the `self_admin` cluster does not receive the correct IP for the admin listener - it will always be `127.0.0.1`, regardless of what the `consul connect envoy` command specified. This makes it impossible to bind the admin listener to an IP other than `127.0.0.1` and be able to correctly scrape prometheus metrics.
My guess is because the IP is [hard-coded into the bootstrap command](https://github.com/hashicorp/consul/blob/v1.10.1/command/connect/envoy/bootstrap_config.go#L604) and cannot be changed, regardless of what the admin bind flag was set to.
This problem was discovered due to a recent "bug fix" in Nomad that results in the admin listener for envoy sidecars to bind to `127.0.0.2` instead of `127.0.0.1`: https://github.com/hashicorp/nomad/pull/10883. The issue makes it impossible to use Nomad 1.1.3 and collect prometheus metrics from envoy.
#### Reproduction Steps
1. Start a local consul agent
```shell
consul agent -dev
```
2. In a second terminal, run the following:
```shell
/bin/cat <<"EOM" | consul config write -
Kind = "proxy-defaults"
Name = "global"
Config {
protocol = "http"
envoy_prometheus_bind_addr = "0.0.0.0:9114"
}
EOM
consul connect envoy \
-admin-bind=127.0.0.2:19002 \
-address=127.0.0.1:19001 \
-gateway=mesh \
-register
```
3. In a third terminal, get the listeners on the envoy proxy with: `curl -s 127.0.0.2:19002/listeners`. This should show it registered a prometheus listener with output like the following
```
envoy_prometheus_metrics_listener::0.0.0.0:9114
default:127.0.0.1:19001::127.0.0.1:19001
```
4. However, the upstream cluster for `self_admin` will have the wrong IP of `127.0.0.1`, not `127.0.0.2`. Running `curl -s 127.0.0.2:19002/clusters | grep self_admin | sort` confirms this with output like the following:
```shell
self_admin::127.0.0.1:19002::canary::false
self_admin::127.0.0.1:19002::cx_active::0
self_admin::127.0.0.1:19002::cx_connect_fail::0
self_admin::127.0.0.1:19002::cx_total::0
self_admin::127.0.0.1:19002::health_flags::healthy
self_admin::127.0.0.1:19002::hostname::
self_admin::127.0.0.1:19002::local_origin_success_rate::-1.0
self_admin::127.0.0.1:19002::priority::0
self_admin::127.0.0.1:19002::region::
self_admin::127.0.0.1:19002::rq_active::0
self_admin::127.0.0.1:19002::rq_error::0
self_admin::127.0.0.1:19002::rq_success::0
self_admin::127.0.0.1:19002::rq_timeout::0
self_admin::127.0.0.1:19002::rq_total::0
self_admin::127.0.0.1:19002::sub_zone::
self_admin::127.0.0.1:19002::success_rate::-1.0
self_admin::127.0.0.1:19002::weight::1
self_admin::127.0.0.1:19002::zone::
```
5. And consequently, curling the prometheus listener with `curl -s localhost:9114/metrics` results in a 503:
```
upstream connect error or disconnect/reset before headers. reset reason: connection failure
```
### Consul info for both Client and Server
<details>
<summary>Client info</summary>
```
agent:
check_monitors = 0
check_ttls = 0
checks = 1
services = 1
build:
prerelease =
revision = db839f18
version = 1.10.1
consul:
acl = disabled
bootstrap = false
known_datacenters = 1
leader = true
leader_addr = 127.0.0.1:8300
server = true
raft:
applied_index = 77
commit_index = 77
fsm_pending = 0
last_contact = 0
last_log_index = 77
last_log_term = 2
last_snapshot_index = 0
last_snapshot_term = 0
latest_configuration = [{Suffrage:Voter ID:1c8a1e81-16d4-86a6-bd21-2af1a0a4de76 Address:127.0.0.1:8300}]
latest_configuration_index = 0
num_peers = 0
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 2
runtime:
arch = amd64
cpu_count = 8
goroutines = 131
max_procs = 8
os = linux
version = go1.16.6
serf_lan:
coordinate_resets = 0
encrypted = false
event_queue = 1
event_time = 2
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1
members = 1
query_queue = 0
query_time = 1
serf_wan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1
members = 1
query_queue = 0
query_time = 1
```
</details>
### Operating system and Environment details
`envoy --version`
```
envoy version: 98c1c9e9a40804b93b074badad1cdf284b47d58b/1.18.3/clean-getenvoy-b76c773-envoy/RELEASE/BoringSSL
```
Thank you for considering to take another look into this. I still think a Learn guide would be helpful to Nomad users, though!
Good luck with Nomad!