Viewling logs for failes jobs

When running a nomad job; sometimes tasks fail before they can even start. I found, one possible reason is if there is a typo in the image name provided for Docker driver; which means the image is not found at the time of deployment.

It is very hard to debug errors that happen in absence of logs. Is there a place where nomad writes logs for failed allocations and with hint on why it failed. I am trying to use meta and variable in image tag with Docker, and I can not debug what Nomad is evaluating it to and failing to deploy.

nomad alloc logs only works for allocations that have started. If a task fails before starting (say there was an issue with pulling image from the docker hub) or there was another error in the nomad job file (if say variables are used and some of them do not have proper values), the allocation logs can not be used.

In this case, the following is seen:

nomad alloc logs f713a4
Error reading file: Unexpected response code: 404 (task "experiment-container-task" not started yet. No logs available)

Is there a log file which contains logs before the task starts which contain things like pulling the image or starting the task. In our case, we use the following in job file:

meta {
    IMAGE_TAG = "ae2ba0b"

and then use
config {

In this case if IMAGE_TAG does not exist the alloc is not scheduled and the task continuously fails and get restarted.

The Nomad client log will have this information. You can also usually see the major events in the nomad alloc status.

Thanks. You mention

Nomad client log will have this information

How to find this client log? I am unable to see anything in /opt/nomad or /var/log folders. I followed steps for production setup exactly as outlined in on multiple Linux servers

You’re probably running Nomad as a systemd service? In which case the Nomad client log would be in the journal. Access it via something like journalctl -u nomad. (The DO folks have a great guide to journalctl)

1 Like