When running a nomad job; sometimes tasks fail before they can even start. I found, one possible reason is if there is a typo in the image name provided for Docker driver; which means the image is not found at the time of deployment.
It is very hard to debug errors that happen in absence of logs. Is there a place where nomad writes logs for failed allocations and with hint on why it failed. I am trying to use meta and variable in image tag with Docker, and I can not debug what Nomad is evaluating it to and failing to deploy.
nomad alloc logs only works for allocations that have started. If a task fails before starting (say there was an issue with pulling image from the docker hub) or there was another error in the nomad job file (if say variables are used and some of them do not have proper values), the allocation logs can not be used.
In this case, the following is seen:
nomad alloc logs f713a4
Error reading file: Unexpected response code: 404 (task "experiment-container-task" not started yet. No logs available)
Is there a log file which contains logs before the task starts which contain things like pulling the image or starting the task. In our case, we use the following in job file:
You’re probably running Nomad as a systemd service? In which case the Nomad client log would be in the journal. Access it via something like journalctl -u nomad. (The DO folks have a great guide to journalctl)
I have added :
log_file = “/var/log/nomad/”
log_level = “DEBUG”
to /etc/nomad.d/nomad.hcl (server agent) and restarted the service.
Tried to run a job which keeps having a status: Pending
The folder /var/log/nomad/ does not exist and no log file created
What could be the reason?
The log file directive is a little bit confusing. In fact, it represents the directory under which the log files are written, and must be precreated (nomad does not create it for you).
Precreate /var/log/nomad and the logs will be created underneath that directory as nomad-<timestamp>.log.
Thank you for your assistance. I have found the logs for the failing to deploy job at /tmp/ folder in my case, but I wonder - is there any way to get those logs via command line / API?