Consul and Prometheus with ACLs and TLS enable

Hi all,

we are using Consul (v1.9.7-ent) on Kubernetes and we would like to monitor beyond the service mesh activity, the platform too with Prometheus and Grafana.

We are enabled the ACLs and the TLS: is it feasible to reach our goal with this setup or could we only observe the metrics coming from the service mesh?

If the answer to the question is “Yes”, where could we find some guide for the setup?

Reading the documentation and also after some preliminary tests it seems that when we enable the TLS we have to put in the helm values yaml the property global.metrics.enableAgentMetrics to false and this is preventing Prometheus to be able to scrape all the consul_* metrics.

Is it there a workaround?

Thank you very much,
Davide

1 Like

Hi Davide,
Currently the only workaround would be to use a Prometheus ServiceMonitor which provides more configurability around scraping metrics. You could then configure it with a CA cert so it can pull from Consul’s TLS endpoints. The reason it doesn’t work out of the box is because the prometheus annotations don’t support TLS.

Unfortunately we don’t have any instructions for how to set this up. Here is the TLSConfig that can be used to configure a ServiceMonitor CRD (prometheus-operator/api.md at master · prometheus-operator/prometheus-operator · GitHub).

Hi @lkysow ,
how can I able to test it? When global.tls.enabledandglobal.tls.httpsOnlyare set to true` , and Consul agent metrics are enabled also, the installation be error out during Helm template rendering.
Only if I will have both enabled I will able to investigate on the right prometheus ServiceMonitor you suggested settings…am i missing somethings?

Thanks a lot.

Antonio

1 Like

Ahh I see. I think for now you would need to set metrics to false in the Helm chart and then use server.extraConfig and client.extraConfig to manually set the metrics config:

global:
  metrics:
    enableAgentMetrics: false
server:
  extraConfig: |
    {
      "telemetry": {
        "prometheus_retention_time": "1m"
      }
    }
client:
  extraConfig: |
    {
      "telemetry": {
        "prometheus_retention_time": "1m"
      }
    }

In addition you’d need to create an ACL policy/token with agent:read permissions:

agent_prefix "" {
  policy = "read"
}

that’s passed via the X-Consul-Token HTTP header.

@lkysow Thank you very much for these really useful tips.

Is there also a way to enable the Prometheus scraping options? (On server: https://github.com/hashicorp/consul-helm/blob/v0.31.1/templates/server-statefulset.yaml#L53 / on client: https://github.com/hashicorp/consul-helm/blob/v0.31.1/templates/client-daemonset.yaml#L44)

If you’re using the ServiceMonitor CRD then I think you can specify what it scrapes based on other labels, i.e. you don’t need the annotations.

@lkysow Thanks for the explanations.

Any update about that? There is still no other solution for this?

We are not able to scrape Agent metrics with ACL + TLS enabled, which is recommended for productive environments.

2 Likes

@lkysow Is there any proper documentation on how to overcome this from the consul side? How to make the Prometheus scrape the TLS endpoint in consul? Because it is an important feature that needs to be in place. Or else A proper monitoring solution cannot be implemented

@Davide_Salerno_GBS Did you tried implementing the solution? If successful please share the details. It will be useful for the community

Hi there,

I have gotten this working. The docs aren’t super clear on this, but you basically just need to use a tls_config block inside the job definition.

This implementation assumes that you are running a grafana agent on a local node, but if you were doing a consul_catalog style scrape, the block configurations should be fairly similar.

Below is a snippet from a consul-template file I use to generate a configuration yaml for grafana-agent which runs on all my nodes. You’ll see that you can use similar configuration for scraping metrics from Consul, Nomad, and Vault agents if they are running on your target node.

I’m also using Consul-Template to look up configuration values from my Consul KV so I can control dozens of agents from a single set of config values. There are some risks of doing this (like accidentally creating a metrics blackout through misconfiguration) but it’s pretty powerful.

metrics:
  wal_directory: {{ keyOrDefault "services/grafana-cloud/METRICS_WAL_DIRECTORY" "/tmp/grafana-agent-wal" }}
  global:
    scrape_interval: {{ keyOrDefault "services/grafana-cloud/METRICS_GLOBAL_SCRAPE_INTERVAL" "60s"}}
    remote_write:
      - basic_auth: {{ with secret "secret/services/grafana-cloud" }}
          username: {{ .Data.data.METRICS_REMOTE_WRITE_BASIC_AUTH_USERNAME }}
          password: {{ .Data.data.METRICS_REMOTE_WRITE_BASIC_AUTH_PASSWORD }}{{ end }}
        url: {{ keyOrDefault "services/grafana-cloud/METRICS_GLOBAL_REMOTE_WRITE_URL" "https://prometheus-us-central1.grafana.net/api/prom/push" }}
        write_relabel_configs:
        - source_labels: {{ keyOrDefault "services/grafana-cloud/METRICS_REMOTE_WRITE_WRITE_LABEL_CONFIGS_SOURCE_LABELS" "[__name__]" }}
          regex: {{ key "services/grafana-cloud/METRICS_REMOTE_WRITE_WRITE_LABEL_CONFIGS_REGEX" }}
          action: {{ keyOrDefault "services/grafana-cloud/METRICS_REMOTE_WRITE_WRITE_LABEL_CONFIGS_ACTION" "keep" }}

  configs:
    - name: integrations
      scrape_configs:
      # all nodes are Consul Clients
      - job_name: integrations/consul
        metrics_path: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_METRICS_PATH" "/v1/agent/metrics" }}
        params:
          format:
            - prometheus
        scheme: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SCHEME" "https"}}
        scrape_interval: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SCRAPE_INTERVAL" "60s"}}
        static_configs:
          - targets:
            - localhost:8501
            labels:
              datacenter: "home"
              job: "consul_agents"
              host: {{ env "HOSTNAME" }}
              instance: {{ env "HOSTNAME" }}
              node: {{ env "CONSUL_NODE_NAME" }}
        tls_config:
          ca_file: {{ env "CONSUL_CACERT" }}
          cert_file: {{ env "CONSUL_CLIENT_CERT" }}
          key_file: {{ env "CONSUL_CLIENT_KEY" }}
          server_name: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SD_CONFIG_SERVER_NAME" "localhost" }}
          insecure_skip_verify: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SD_CONFIG_INSECURE_SKIP_VERIFY" "false" }}
      {{ if keyExists ( or $isNomadClient $isNomadServer ) }}
      # if Nomad Client or Server
      - job_name: integrations/nomad
        metrics_path: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_NOMAD_METRICS_PATH" "/v1/metrics" }}
        params:
          format:
            - prometheus
        scheme: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SCHEME" "https"}}
        scrape_interval: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_NOMAD_SCRAPE_INTERVAL" "60s"}}
        static_configs:
          - targets: 
            - localhost:4646
            labels:
              datacenter: "home"
              job: "nomad_agents"
              host: {{ env "HOSTNAME" }}
              instance: {{ env "HOSTNAME" }}
              node: {{ env "CONSUL_NODE_NAME" }}
        tls_config:
          ca_file: {{ env "NOMAD_CACERT" }}
          cert_file: {{ env "NOMAD_CLIENT_CERT" }}
          key_file: {{ env "NOMAD_CLIENT_KEY" }}
          server_name: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SD_CONFIG_SERVER_NAME" "localhost" }}
          insecure_skip_verify: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SD_CONFIG_INSECURE_SKIP_VERIFY" "false" }}
      {{ end }}
      {{ if keyExists $isVaultServer }}
      # if Vault Server
      - job_name: integrations/vault
        metrics_path: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_VAULT_METRICS_PATH" "/v1/sys/metrics" }}
        params:
          format:
            - prometheus
        scheme: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SCHEME" "https"}}
        scrape_interval: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_NOMAD_SCRAPE_INTERVAL" "60s"}}
        bearer_token: {{ env "VAULT_TOKEN" }}
        static_configs:
          - targets: 
            - localhost:8200
            labels:
              datacenter: "home"
              job: "vault_agents"
              host: {{ env "HOSTNAME" }}
              instance: {{ env "HOSTNAME" }}
              node: {{ env "CONSUL_NODE_NAME" }}
        tls_config:
          ca_file: {{ env "VAULT_CACERT" }}
          cert_file: {{ env "VAULT_CLIENT_CERT" }}
          key_file: {{ env "VAULT_CLIENT_KEY" }}
          server_name: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SD_CONFIG_SERVER_NAME" "localhost" }}
          insecure_skip_verify: {{ keyOrDefault "services/grafana-cloud/METRICS_CONFIGS_INTEGRATIONS_SCRAPE_CONFIGS_CONSUL_SD_CONFIG_INSECURE_SKIP_VERIFY" "false" }}
      {{ end }}
1 Like