Consul and Prometheus with ACLs and TLS enable

Davide_Salerno_GBS · July 16, 2021, 2:25pm

Hi all,

we are using Consul (v1.9.7-ent) on Kubernetes and we would like to monitor beyond the service mesh activity, the platform too with Prometheus and Grafana.

We are enabled the ACLs and the TLS: is it feasible to reach our goal with this setup or could we only observe the metrics coming from the service mesh?

If the answer to the question is “Yes”, where could we find some guide for the setup?

Reading the documentation and also after some preliminary tests it seems that when we enable the TLS we have to put in the helm values yaml the property global.metrics.enableAgentMetrics to false and this is preventing Prometheus to be able to scrape all the consul_* metrics.

Is it there a workaround?

Thank you very much,
Davide

lkysow · July 16, 2021, 3:56pm

Hi Davide,
Currently the only workaround would be to use a Prometheus ServiceMonitor which provides more configurability around scraping metrics. You could then configure it with a CA cert so it can pull from Consul’s TLS endpoints. The reason it doesn’t work out of the box is because the prometheus annotations don’t support TLS.

Unfortunately we don’t have any instructions for how to set this up. Here is the TLSConfig that can be used to configure a ServiceMonitor CRD (prometheus-operator/api.md at master · prometheus-operator/prometheus-operator · GitHub).

devilmind81 · July 19, 2021, 1:20pm

Hi @lkysow ,
how can I able to test it? When global.tls.enabledandglobal.tls.httpsOnlyare set to true` , and Consul agent metrics are enabled also, the installation be error out during Helm template rendering.
Only if I will have both enabled I will able to investigate on the right prometheus ServiceMonitor you suggested settings…am i missing somethings?

Thanks a lot.

Antonio

lkysow · July 20, 2021, 6:26pm

Ahh I see. I think for now you would need to set metrics to false in the Helm chart and then use server.extraConfig and client.extraConfig to manually set the metrics config:

global:
  metrics:
    enableAgentMetrics: false
server:
  extraConfig: |
    {
      "telemetry": {
        "prometheus_retention_time": "1m"
      }
    }
client:
  extraConfig: |
    {
      "telemetry": {
        "prometheus_retention_time": "1m"
      }
    }

lkysow · July 20, 2021, 8:01pm

In addition you’d need to create an ACL policy/token with agent:read permissions:

agent_prefix "" {
  policy = "read"
}

that’s passed via the X-Consul-Token HTTP header.

Davide_Salerno_GBS · July 21, 2021, 6:55pm

@lkysow Thank you very much for these really useful tips.

Is there also a way to enable the Prometheus scraping options? (On server: https://github.com/hashicorp/consul-helm/blob/v0.31.1/templates/server-statefulset.yaml#L53 / on client: https://github.com/hashicorp/consul-helm/blob/v0.31.1/templates/client-daemonset.yaml#L44)

lkysow · July 21, 2021, 10:04pm

If you’re using the ServiceMonitor CRD then I think you can specify what it scrapes based on other labels, i.e. you don’t need the annotations.

jeanmorais · September 25, 2021, 4:03am

@lkysow Thanks for the explanations.

Any update about that? There is still no other solution for this?

We are not able to scrape Agent metrics with ACL + TLS enabled, which is recommended for productive environments.

MageshSrinivasulu · July 18, 2022, 2:53pm

@lkysow Is there any proper documentation on how to overcome this from the consul side? How to make the Prometheus scrape the TLS endpoint in consul? Because it is an important feature that needs to be in place. Or else A proper monitoring solution cannot be implemented

@Davide_Salerno_GBS Did you tried implementing the solution? If successful please share the details. It will be useful for the community

dehuszar · September 24, 2022, 7:21pm

Hi there,

I have gotten this working. The docs aren’t super clear on this, but you basically just need to use a tls_config block inside the job definition.

This implementation assumes that you are running a grafana agent on a local node, but if you were doing a consul_catalog style scrape, the block configurations should be fairly similar.

Below is a snippet from a consul-template file I use to generate a configuration yaml for grafana-agent which runs on all my nodes. You’ll see that you can use similar configuration for scraping metrics from Consul, Nomad, and Vault agents if they are running on your target node.

I’m also using Consul-Template to look up configuration values from my Consul KV so I can control dozens of agents from a single set of config values. There are some risks of doing this (like accidentally creating a metrics blackout through misconfiguration) but it’s pretty powerful.

metrics:
  wal_directory: {{ keyOrDefault "services/grafana-cloud/METRICS_WAL_DIRECTORY" "/tmp/grafana-agent-wal" }}
  global:
    scrape_interval: {{ keyOrDefault "services/grafana-cloud/METRICS_GLOBAL_SCRAPE_INTERVAL" "60s"}}
    remote_write:
      - basic_auth: {{ with secret "secret/services/grafana-cloud" }}
          username: {{ .Data.data.METRICS_REMOTE_WRITE_BASIC_AUTH_USERNAME }}
          password: {{ .Data.data.METRICS_REMOTE_WRITE_BASIC_AUTH_PASSWORD }}{{ end }}
        url: {{ keyOrDefault "services/grafana-cloud/METRICS_GLOBAL_REMOTE_WRITE_URL" "https://prometheus-us-central1.grafana.net/api/prom/push" }}
        write_relabel_configs:
        - source_labels: {{ keyOrDefault "services/grafana-cloud/METRICS_REMOTE_WRITE_WRITE_LABEL_CONFIGS_SOURCE_LABELS" "[__name__]" }}
          regex: {{ key "services/grafana-cloud/METRICS_REMOTE_WRITE_WRITE_LABEL_CONFIGS_REGEX" }}
          action: {{ keyOrDefault "services/grafana-cloud/METRICS_REMOTE_WRITE_WRITE_LABEL_CONFIGS_ACTION" "keep" }}

  configs:
    - name: integrations
      scrape_configs:
      # all nodes are Consul Clients
      - job_name: integrations/consul
        metrics_path: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_METRICS_PATH" "/v1/agent/metrics" }}
        params:
          format:
            - prometheus
        scheme: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SCHEME" "https"}}
        scrape_interval: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SCRAPE_INTERVAL" "60s"}}
        static_configs:
          - targets:
            - localhost:8501
            labels:
              datacenter: "home"
              job: "consul_agents"
              host: {{ env "HOSTNAME" }}
              instance: {{ env "HOSTNAME" }}
              node: {{ env "CONSUL_NODE_NAME" }}
        tls_config:
          ca_file: {{ env "CONSUL_CACERT" }}
          cert_file: {{ env "CONSUL_CLIENT_CERT" }}
          key_file: {{ env "CONSUL_CLIENT_KEY" }}
          server_name: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SD_CONFIG_SERVER_NAME" "localhost" }}
          insecure_skip_verify: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SD_CONFIG_INSECURE_SKIP_VERIFY" "false" }}
      {{ if keyExists ( or $isNomadClient $isNomadServer ) }}
      # if Nomad Client or Server
      - job_name: integrations/nomad
        metrics_path: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_NOMAD_METRICS_PATH" "/v1/metrics" }}
        params:
          format:
            - prometheus
        scheme: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SCHEME" "https"}}
        scrape_interval: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_NOMAD_SCRAPE_INTERVAL" "60s"}}
        static_configs:
          - targets: 
            - localhost:4646
            labels:
              datacenter: "home"
              job: "nomad_agents"
              host: {{ env "HOSTNAME" }}
              instance: {{ env "HOSTNAME" }}
              node: {{ env "CONSUL_NODE_NAME" }}
        tls_config:
          ca_file: {{ env "NOMAD_CACERT" }}
          cert_file: {{ env "NOMAD_CLIENT_CERT" }}
          key_file: {{ env "NOMAD_CLIENT_KEY" }}
          server_name: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SD_CONFIG_SERVER_NAME" "localhost" }}
          insecure_skip_verify: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SD_CONFIG_INSECURE_SKIP_VERIFY" "false" }}
      {{ end }}
      {{ if keyExists $isVaultServer }}
      # if Vault Server
      - job_name: integrations/vault
        metrics_path: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_VAULT_METRICS_PATH" "/v1/sys/metrics" }}
        params:
          format:
            - prometheus
        scheme: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SCHEME" "https"}}
        scrape_interval: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_NOMAD_SCRAPE_INTERVAL" "60s"}}
        bearer_token: {{ env "VAULT_TOKEN" }}
        static_configs:
          - targets: 
            - localhost:8200
            labels:
              datacenter: "home"
              job: "vault_agents"
              host: {{ env "HOSTNAME" }}
              instance: {{ env "HOSTNAME" }}
              node: {{ env "CONSUL_NODE_NAME" }}
        tls_config:
          ca_file: {{ env "VAULT_CACERT" }}
          cert_file: {{ env "VAULT_CLIENT_CERT" }}
          key_file: {{ env "VAULT_CLIENT_KEY" }}
          server_name: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SD_CONFIG_SERVER_NAME" "localhost" }}
          insecure_skip_verify: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SD_CONFIG_INSECURE_SKIP_VERIFY" "false" }}
      {{ end }}

Topic		Replies	Views
Consul and Prometheus with ACLs enabled Consul	4	4201	August 19, 2020
Consul on k8s, how to provide ACL to service? Consul	2	356	February 1, 2021
proxyDefaults for Envoy - Kubernetes with External Consul server Consul	3	346	January 26, 2021
Exposing an unauthenticated prometheus endpoint when tls is enabled Consul	1	492	December 3, 2019
External Consul Clients - Kubernetes Servers - TLS and ACL Enabled Consul	1	278	September 1, 2020

Consul and Prometheus with ACLs and TLS enable

Related topics