we are using Consul (v1.9.7-ent) on Kubernetes and we would like to monitor beyond the service mesh activity, the platform too with Prometheus and Grafana.
We are enabled the ACLs and the TLS: is it feasible to reach our goal with this setup or could we only observe the metrics coming from the service mesh?
If the answer to the question is “Yes”, where could we find some guide for the setup?
Reading the documentation and also after some preliminary tests it seems that when we enable the TLS we have to put in the helm values yaml the property global.metrics.enableAgentMetrics to false and this is preventing Prometheus to be able to scrape all the consul_* metrics.
Hi Davide,
Currently the only workaround would be to use a Prometheus ServiceMonitor which provides more configurability around scraping metrics. You could then configure it with a CA cert so it can pull from Consul’s TLS endpoints. The reason it doesn’t work out of the box is because the prometheus annotations don’t support TLS.
Hi @lkysow ,
how can I able to test it? When global.tls.enabledandglobal.tls.httpsOnlyare set to true` , and Consul agent metrics are enabled also, the installation be error out during Helm template rendering.
Only if I will have both enabled I will able to investigate on the right prometheus ServiceMonitor you suggested settings…am i missing somethings?
Ahh I see. I think for now you would need to set metrics to false in the Helm chart and then use server.extraConfig and client.extraConfig to manually set the metrics config:
@lkysow Is there any proper documentation on how to overcome this from the consul side? How to make the Prometheus scrape the TLS endpoint in consul? Because it is an important feature that needs to be in place. Or else A proper monitoring solution cannot be implemented
@Davide_Salerno_GBS Did you tried implementing the solution? If successful please share the details. It will be useful for the community
I have gotten this working. The docs aren’t super clear on this, but you basically just need to use a tls_config block inside the job definition.
This implementation assumes that you are running a grafana agent on a local node, but if you were doing a consul_catalog style scrape, the block configurations should be fairly similar.
Below is a snippet from a consul-template file I use to generate a configuration yaml for grafana-agent which runs on all my nodes. You’ll see that you can use similar configuration for scraping metrics from Consul, Nomad, and Vault agents if they are running on your target node.
I’m also using Consul-Template to look up configuration values from my Consul KV so I can control dozens of agents from a single set of config values. There are some risks of doing this (like accidentally creating a metrics blackout through misconfiguration) but it’s pretty powerful.