General recommendations for logging

So, I’ve a logging setup for docker driver (thanks to this post) and get logs with enriched metadata.

However, my questions are more for exec and raw_exec driver. I don’t seem to find enough docs/config options to get this up and running myself, hence asking here.

Currently, Nomad stores the stdout/stderr logs inside alloc/logs/ directory. If the task is executing as exec, there’s no way to access these logs from the host. The typical solution of running a log collection agent (like vector) as a system job and read logs from a common directory (like /var/logs) isn’t possible, because Nomad currently doesn’t provide the option to configure a logging directory.

So, is the only choice to run a log collection agent as a sidecar task, which has access to /alloc/logs directory and collect it from there?

EDIT: I can also run vector as a system job and collect logs from /opt/nomad/data/allocs/ folder. But the biggest issue here is that I won’t know which particular job / group the alloc belongs to, unless I query Nomad API somehow before collecting the logs.

This seems a bit too hard than it should be :sweat_smile: Just wondering if I am missing out on something :slight_smile:

2 Likes

They way I approached this problem - also loosely based on the blog post you mention - I just run system promtail job that scrapes everything I’m interested in and pushes it to Grafana cloud.

Because I use consul I grab the services that have promtail=true (chosen arbitrarily), grab the ID of the task from the consul (_nomad-task-* part) and slurp alloc/logs/ directory (I mount system’s /var/nomad/alloc directory on the job in ro mode)

It’s probably not the most elegant solution but works for me :sweat_smile:

This week I hope to do the small write-up on the subject - if you’re interested in implementation details I can reach out to you (don’t want to spam the forum with external resources of my own making - will check the forum rules in that matter :see_no_evil:)

I’ve been using nomad-vector-logger in production since sometime. It’s working well for my usecase that I described above.

The daemon runs alongside vector (same group, different tasks so that they can share the same alloc_dir). It periodically fetches a list of allocations on the current node with their filepath and templates out a vector config. The user can provide their own extra vector templates to configure extra transformation (JSON/logfmt parsing/splitting namespaces etc) and then finally route to sink providers (Clickhouse/Elastic/S3 etc).

When the allocs are stopped, this daemon also removes them from the templated file after a configurable delay period, to ensure that Vector has finished processing all the remaining logs.

If anyone is facing an issue with exec/raw_exec tasks logging, check this out :slight_smile: Feedback welcome!