Are there any high-level documents describing an analytics workflow with HashiStack telemetry output?

This technically includes Nomad and Vault as well. I have gone through the Nomad learn guide for setting up telemetry and having Prometheus scrape out logs via Consul, and I know there is a separate learn guide for setting up Telegraf to pipe telemetry out of Consul.

That said, I can’t seem to get the Nomad learn guide to work as documented, and I’m not sure what to make of the difference of implementation patterns.

Is it particular to the Hashi product being used, or just an artifact of being written by different authors? It would be nice to get some guidance around how to think about the monitoring/logging/tracing story. I am exploring Prometheus, Grafana, Loki, and Jaeger, but any sort of big-picture walkthrough of how to think about how Hashi tools feed into monitoring/logging/tracing tools would be really great.

I also thought that I had to stop with a guide… Can you name the exact step so that it can be better debugged and corrected - or initially offered a workaround?

The guide stops working at Step 5. Basically prometheus attempts to connect to Consul, but is unable to, and therefore cannot suss out the intended targets.

When I review the Nomad logs for the Prometheus job, here’s the output I get:

level=info ts=2020-02-08T12:08:06.576Z caller=main.go:294 msg="no time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-02-08T12:08:06.576Z caller=main.go:330 msg="Starting Prometheus" version="(version=2.15.2, branch=HEAD, revision=d9613e5c466c6e9de548c4dae1b9aabf9aaf7c57)"
level=info ts=2020-02-08T12:08:06.576Z caller=main.go:331 build_context="(go=go1.13.5, user=root@688433cf4ff7, date=20200106-14:50:51)"
level=info ts=2020-02-08T12:08:06.577Z caller=main.go:332 host_details="(Linux 5.3.0-29-lowlatency #31-Ubuntu SMP PREEMPT Fri Jan 17 18:32:27 UTC 2020 x86_64 9ff6f83ddbb7 (none))"
level=info ts=2020-02-08T12:08:06.577Z caller=main.go:333 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-02-08T12:08:06.577Z caller=main.go:334 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-02-08T12:08:06.580Z caller=main.go:648 msg="Starting TSDB ..."
level=info ts=2020-02-08T12:08:06.580Z caller=web.go:506 component=web msg="Start listening for connections" address=
level=info ts=2020-02-08T12:08:06.594Z caller=head.go:584 component=tsdb msg="replaying WAL, this may take awhile"
level=info ts=2020-02-08T12:08:06.594Z caller=head.go:632 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2020-02-08T12:08:06.597Z caller=main.go:663 fs_type=EXT4_SUPER_MAGIC
level=info ts=2020-02-08T12:08:06.597Z caller=main.go:664 msg="TSDB started"
level=info ts=2020-02-08T12:08:06.597Z caller=main.go:734 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-02-08T12:08:06.599Z caller=main.go:762 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-02-08T12:08:06.599Z caller=main.go:617 msg="Server is ready to receive web requests."
level=error ts=2020-02-08T12:08:06.600Z caller=consul.go:269 component="discovery manager scrape" discovery=consul msg="Error retrieving datacenter name" err="Get dial tcp connect: connection refused"

…and then that last line just repeats endlessly for the life of the job.

I am running Nomad and Consul agents in dev-mode when working through the guide

At first glance it could be due to the variable NOMAD_IP_prometheus_ui. But this is only a theoretical assumption. I’ll play it through tonight.

Because the repo given in the guide didn’t work for me, I used HashiQube ( I could walk through the guide completely.

If you want to finish the guide, that would be an alternative.

1 Like