Viewling logs for failes jobs

When running a nomad job; sometimes tasks fail before they can even start. I found, one possible reason is if there is a typo in the image name provided for Docker driver; which means the image is not found at the time of deployment.

It is very hard to debug errors that happen in absence of logs. Is there a place where nomad writes logs for failed allocations and with hint on why it failed. I am trying to use meta and variable in image tag with Docker, and I can not debug what Nomad is evaluating it to and failing to deploy.

1 Like

nomad alloc logs only works for allocations that have started. If a task fails before starting (say there was an issue with pulling image from the docker hub) or there was another error in the nomad job file (if say variables are used and some of them do not have proper values), the allocation logs can not be used.

In this case, the following is seen:

nomad alloc logs f713a4
Error reading file: Unexpected response code: 404 (task "experiment-container-task" not started yet. No logs available)

Is there a log file which contains logs before the task starts which contain things like pulling the image or starting the task. In our case, we use the following in job file:

meta {
    IMAGE_TAG = "ae2ba0b"
  }

and then use
config {
image = “myinternalrepo.com/appcontainer:${NOMAD_META_IMAGE_TAG}
}

In this case if IMAGE_TAG does not exist the alloc is not scheduled and the task continuously fails and get restarted.

1 Like

The Nomad client log will have this information. You can also usually see the major events in the nomad alloc status.

Thanks. You mention

Nomad client log will have this information

How to find this client log? I am unable to see anything in /opt/nomad or /var/log folders. I followed steps for production setup exactly as outlined in https://www.nomadproject.io/docs/install/production/deployment-guide/ on multiple Linux servers

You’re probably running Nomad as a systemd service? In which case the Nomad client log would be in the journal. Access it via something like journalctl -u nomad. (The DO folks have a great guide to journalctl)

2 Likes

Hi,

I read the conversation but cannot find an answer, where can I find logs for failed jobs?

Thanks,
Shachar

Check the journal of the system. You can also specify agent logs in the configuration file https://www.nomadproject.io/docs/configuration

log_file = "/var/log/nomad/"
log_level = "DEBUG"

I have added :
log_file = “/var/log/nomad/”
log_level = “DEBUG”
to /etc/nomad.d/nomad.hcl (server agent) and restarted the service.
Tried to run a job which keeps having a status: Pending
The folder /var/log/nomad/ does not exist and no log file created
What could be the reason?

Have you created /var/log/nomad and given correct permissions?

1 Like

The log file directive is a little bit confusing. In fact, it represents the directory under which the log files are written, and must be precreated (nomad does not create it for you).

Precreate /var/log/nomad and the logs will be created underneath that directory as nomad-<timestamp>.log.

2 Likes

It works after I added the folder and given permissions, logs are created.
Thanks a lot!!

1 Like

Thank you for your assistance. I have found the logs for the failing to deploy job at /tmp/ folder in my case, but I wonder - is there any way to get those logs via command line / API?