Consul and Prometheus with ACLs enabled

Hi folks,
I’ve enabled ACLs on my dev consul cluster which is monitored by prometheus and now I’m getting error messages like this:

19:48:16.332+0300 [ERROR] agent.http: Request error: method=GET url=/v1/agent/metrics?format=prometheus from=10.18.13.25:51628 error=“Permission denied”

I understand that I have no corresponding policy and token in place. But I’m struggling with how to generate consul policy to permit metrics url access from prometheus, which rules should it contain.

Hi @mchumakov,

Our API documentation contains a table under each of the API paths which calls out the necessary permissions. In the case of /v1/agent/metrics, the required permission is agent:read.

Here’s an example policy which grants this permission for the node named consul-client-1.

agent "consul-client-1" {
  policy = "read"
}

Alternatively you can utilize agent_prefix to grant access to a set of nodes who’s hostnames begin with the “consul-client-” prefix.

agent_prefix "consul-client-" {
  policy = "read"
}

Hi @blake,

thank you for your help and pointing me to the right direction. I’ve created next acl:

[root@vml-hq-dev-consul-1 tmp]# consul acl policy read -name vml-hq-prod-prometheus-1
ID:           415da84f-2033-a976-e19b-1dcc205b2501
Name:         vml-hq-prod-prometheus-1
Description:  vml-hq-prod-prometheus-1 node
Datacenters:
Rules:
agent "vml-hq-prod-prometheus-1" {
  policy = "read"
}

and token

AccessorID:       46b803ce-2f2e-e4d3-5bb8-e8c0fa40b208
Description:      vml-hq-prod-prometheus-1 token
Local:            false
Create Time:      2020-07-13 22:55:07.481883119 +0300 MSK
Legacy:           false
Policies:
   415da84f-2033-a976-e19b-1dcc205b2501 - vml-hq-prod-prometheus-1

which I configured as bearer_token: in prometheus’s corresponding job section:

  - job_name: consul_dev_cluster
    metrics_path: /v1/agent/metrics
    params:
      format:
      - prometheus
    bearer_token: '77db8f37-f785-1b6f-74a7-c4deecc838b2'
    static_configs:
    - labels:
        env: dev
        consul_dc: dev1
      targets:
      - 10.18.8.102:8500
      - 10.18.8.103:8500
      - 10.18.8.104:8500

From tcpdump trace (took it on one of consul servers) I see that prometheus actually set Authorization http header:

Hypertext Transfer Protocol

    GET /v1/agent/metrics?format=prometheus HTTP/1.1\r\n

    Host: 10.18.8.102:8500\r\n

    User-Agent: Prometheus/2.19.0\r\n

    Accept: application/openmetrics-text; version=0.0.1,text/plain;version=0.0.4;q=0.5,*/*;q=0.1\r\n

    Accept-Encoding: gzip\r\n

    Authorization: Bearer 77db8f37-f785-1b6f-74a7-c4deecc838b2\r\n

    X-Prometheus-Scrape-Timeout-Seconds: 10.000000\r\n

    \r\n 

But still getting same errors.
I’m using consul v1.7.4

1 Like

I’ve solved this issue only changing resource statement from agent to agent_prefix and leaving segment part empty.

[root@vml-hq-dev-consul-1 consul.d]# consul acl policy read -id 415da84f-2033-a976-e19b-1dcc205b2501
ID:           415da84f-2033-a976-e19b-1dcc205b2501
Name:         vml-hq-prod-prometheus-1
Description:  vml-hq-prod-prometheus-1 prometheus read policy
Datacenters:
Rules:
agent_prefix "" {
  policy = "read"
}

What I still don’t understand is how to make this policy more restrictive, I don’t want token bound to this policy was applied anywhere else effectively getting read access to metrics.
Hence the question how consul identifies segment part of the policy for such entities as prometheus for example or just curl request from cli?

Hi @mchumakov,

My apologies, I missed your earlier reply.

The token used by Prometheus needs agent:read permission on each agent it is targeting so that you can scrape the metrics. If the hostnames for your targets are consul-server-1 thru consul-server-3 the policy should be:

agent "consul-server-1" {
  policy = "read"
}

agent "consul-server-2" {
  policy = "read"
}

agent "consul-server-3" {
  policy = "read"
}

I hope this is clear. Let me know if you have any other questions.