Can't use check stanza under service using nomad provider

Hi, I am new to nomad and trying to run the job below:

job "slurm-cn" {
  priority = 95 # 100 is higher priority
  datacenters = ["mydc"]
  type = "system" 
  group "slurm-cn" {
    task "slurmd" {
      driver = "raw_exec"
      user = "root"
      config {
        command = "/usr/sbin/slurmd"
        args = ["-D"]
      }
    }
    service {
      provider = "nomad"
      name = "slurmd"
      port = "slurmd"
      check {
        name = "slurmd"
        type = "tcp"
        interval = "9s"
        timeout = "3s"
      }
    }
    network {
      port "slurmd" {
        static = 6818 # host linked port to TCP 6818
      }
    }
  }
}

My problem is that I am getting this error:

$ nomad job run slurm-job.hcl 
Error submitting job: Unexpected response code: 500 (rpc error: 1 error occurred:
	* Task group slurm-cn validation failed: 1 error occurred:
	* Task group service validation failed: 1 error occurred:
	* Service[0] slurmd validation failed: 1 error occurred:
	* Service with provider nomad cannot include Check blocks)

The documentation says I can use “check” stanza inside the “service” one with nomad provider check Block - Job Specification | Nomad by HashiCorp

What am I doing wrong?

thank you

Hi @masuberu,

What version of Nomad are you running? The check block with Nomad service registrations is only supported from 1.4.0-beta.1 onwards.

Thanks,
jrasell and the Nomad team

Nomad v1.3.5 (1359c2580fed080295840fb888e28f0855e42d50)

Is there any other way I can configure this? I would either check a port is listening or maybe run a command exit code to verify the job is running successfully

I don’t understand why this can be configured… I can see nomad is trying to restart services twice as default configuration, why can’t I parametrize this?

client.alloc_runner.task_runner: restarting task: alloc_id=ba52b730-50e7-e43c-29dd-cc6d21550fa3 task=munged reason="Restart within policy" delay=16.307460919s

Hi @masuberu,

Is there any other way I can configure this?

If you are running Consul you could use Consul service registration and health checks, otherwise you’re best either using an external tool to perform health checking or upgrading Nomad to 1.4.0 once the GA version is released.

I can see nomad is trying to restart services twice as default configuration, why can’t I parametrize this?

I believe the restart job specification block would help you configure this behaviour how you want.

Thanks,
jrasell and the Nomad team