Having trouble running jobs on a cluster

Hey guys,

i am running on Ubuntu 20.04
nomad v1.0.4

just created a simple local cluster,
3 servers and 3 clients.

output of command : nomad server members
Name Address Port Status Leader Protocol Build Datacenter Region
nomadm1.global 192.168.14.151 4648 alive false 2 1.0.4 dc1 global
nomadm2.global 192.168.14.152 4648 alive true 2 1.0.4 dc1 global
nomadm3.global 192.168.14.153 4648 alive false 2 1.0.4 dc1 global

cat /etc/nomad.d/nomad.hcl (on a client node)
data_dir = “/opt/nomad/data”
bind_addr = “192.168.14.155”

client {
  enabled = true
  servers = ["192.168.14.151:4647","192.168.14.152:4647","192.168.14.153:4647"]
}

cat /etc/nomad.d/nomad.hcl (on a server node ,each server has the other servers in ‘retry_join’)
data_dir = “/opt/nomad/data”
bind_addr = “192.168.14.152”

server {
  enabled = true
  bootstrap_expect = 3  
  server_join {
    retry_join = ["192.168.14.151:4648","192.168.14.153:4648"]
  }
}

i used bind_addr becuase when trying to join nodes without it they would join with the docker bridge interface.

now i can see all nodes in the management interface.
trying to run any job results with this error
Failed to start container 3db72093caeb1b654e5d3d543b77edc113700e733a75e873f24ce33dd038fcf5: API error (500): error while creating mount source path '/opt/nomad/data/alloc/ebd04c8d-02b5-1006-75ff-7611cb38a903/alloc': mkdir /opt/nomad: read-only file system

My goal is to create an offline cluster which i use local images for my jobs, so i have another web server that i can direct the artifact to,

currenty my job hcl looks like this:

job "test1" {
  # Spread the tasks in this job between us-west-1 and us-east-1.
  datacenters = ["dc1"]

  # Run this job as a "service" type. Each job type has different
  # properties. See the documentation below for more examples.
  type = "service"

  # Specify this job to have rolling updates, two-at-a-time, with
  # 30 second intervals.
  update {
    stagger      = "30s"
    max_parallel = 2
  }


  # A group defines a series of tasks that should be co-located
  # on the same client (host). All tasks within a group will be
  # placed on the same host.
  group "webs" {
    # Specify the number of these tasks we want.
    count = 3

    network {
      # This requests a dynamic port named "http". This will
      # be something like "46283", but we refer to it via the
      # label "http".
      port "http" {}
    }

    # The service block tells Nomad how to register this service
    # with Consul for service discovery and monitoring.
    service {
      # This tells Consul to monitor the service on the port
      # labelled "http". Since Nomad allocates high dynamic port
      # numbers, we use labels to refer to them.
      port = "http"

      check {
        type     = "http"
        path     = "/health"
        interval = "10s"
        timeout  = "2s"
      }
    }

    # Create an individual task (unit of work). This particular
    # task utilizes a Docker container to front a web application.
    task "frontend" {
      # Specify the driver to be "docker". Nomad supports
      # multiple drivers.
      driver = "docker"
      
      artifact {
        source      = "http://192.168.14.250/webappv1.0.1.tar"
        
      # Configuration is specific to each driver.
      }
      
      config {
        load  = "webappv1.0.1.tar"
        image = "webappv1.0.1"
      }

      resources {
        cpu    = 500 # MHz
        memory = 128 # MB
      }
    }
  }
}

any help would be gladly appreciated !

  • As which user are you running Nomad on your client nodes?
  • Who is the owner of /opt/nomad/data?
1 Like

i have a user called nomad which is the owner of /opt/nomad/data

i have entered the line
User=root
to /etc/systemd/system/nomad.service
restarted the service and made sure it runs by root
in all clients,

than changed owner of /opt/nomad to root
sudo chown -R root /opt/nomad

tried again but have the same issue

edit:

Sorry it was a job configuration issue!
I can run jobs smoothly.

Thanks for you help!

Glad to hear it’s working now @fisher.shai :grinning_face_with_smiling_eyes:

Moving forward, make sure your Nomad clients are running as root. This is needed so they are able to manage things like chroot environments etc.

Your servers can run as non-root users, they would only need write access to the data_dir.

For more information, please check our deployment guide and security model.

1 Like

Could you describe the issue with the job config. I seem to getting this error now.