An Issue with Nomad on Kubernetes: "Failed to mount shared directory for task"

nasanos · February 15, 2023, 5:20pm

I’m attempting to run Nomad within a Kubernetes cluster, basing my approach in large part on Kelsey Hightower’s nomad-on-kubernetes repo.

So far, everything seems to go fine until I try to run a job. Consul and Nomad are running, and the Nomad servers are able to connect to each other.

But when I try to run a simple exec job, I get the following error:

task_dir: Failed to mount shared directory for task: permission denied

Any ideas on what’s going on here and how I can avoid it?

I’ve tried various adjustments to volumes and permissions within the Nomad deployment configuration, but nothing has changed this error.

For reference, here is the Kubernetes configuration I use to deploy Nomad:

apiVersion: v1
kind: Service
metadata:
  name: nomad-cluster-service
  labels:
    name: nomad
spec:
  type: LoadBalancer
  ports:
    - name: http
      port: 4646
      protocol: "TCP"
    - name: rpc
      port: 4647
      protocol: "TCP"
  selector:
    app: nomad
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: nomad-cluster-configmap
  labels:
    app: nomad
data:
  server.hcl: |
    datacenter = "dc1"
    data_dir = "/opt/nomad/data"

    bind_addr = "0.0.0.0"

    server {
        enabled = true
        bootstrap_expect = 3
    }

    client {
        enabled = true
        options {
            "driver.raw_exec.enable" = "1"
            "docker.privileged.enabled" = "true"
        }
    }

    consul {
      address = "consul-server.default.svc:8500"
    }
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nomad-cluster-deployment
  labels:
    app: nomad
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nomad
  template:
    metadata:
      labels:
        app: nomad
    spec:
      containers:
      - name: nomad-instance
        image: noenv/nomad
        imagePullPolicy: IfNotPresent
        args:
        - "agent"
        - "-config=/etc/nomad/nomad.d/server.hcl"
        ports:
        - name: http
          containerPort: 4646
          protocol: "TCP"
        - name: rpc
          containerPort: 4647
          protocol: "TCP"
        - name: serf-tcp
          containerPort: 4648
          protocol: "TCP"
        - name: serf-udp
          containerPort: 4648
          protocol: "UDP"
        volumeMounts:
        - name: nomad-config
          mountPath: /etc/nomad/nomad.d
        - name: nomad-data
          mountPath: /opt/nomad/data
      securityContext:
        fsGroup: 1000
      volumes:
      - name: nomad-config
        configMap:
          name: nomad-cluster-configmap
      - name: nomad-data
        emptyDir: {}

And here is the configuration I use for the simple job I’m trying to test the setup with. I have another Kubernetes pod running an http-echo container on port 3030 which this job should attempt to access:

job "example-job" {
  datacenters = ["dc1"]

  group "example-group" {
    count = 3

    task "curl-task" {
      driver = "exec"

      config {
        command = "curl"
        args = [
          "http-echo-service.default.svc:3030",
          "-o", "nomad-output.txt"
        ]
      }
    }
  }
}

Should it be helpful, I can definitely provide more pieces of my setup — the Consul deployment, the http-echo deployment, etc.

seth.hoenig · February 16, 2023, 2:42pm

Hi @nasanos, for obvious reasons there likely aren’t many k8s experts on this forum.

However I suspect the error your seeing is based on the fact that the exec driver requires root level permissions, as it uses the mount() system call create a shared directory for tasks within a group. You could try using the raw_exec driver, but again, Nomad was not at all intended to be run inside k8s.

nasanos · February 16, 2023, 5:41pm

Thanks for the response, @seth.hoenig! I definitely understand that; I’d seen a few people around the web trying out this kind of setup, but it’s clearly niche and not something I would expect most of the community to have much experience with.

That said, your insights helped lead me to a solution! Specifically, I attempted to address Nomd’s access needs by adding the securityContext block below to the Kubernetes configuration within the spec block holding the Nomad container and the Nomad data volume.

securityContext:
  runAsUser: 1001
  runAsGroup: 1001
  fsGroup: 1001

I think your comment anticipates this, but the Nomad clients seem no longer able to use the exec driver, just raw_exec. But using the raw_exec driver, the clients are able to mount the task_dir and run the curl command successfully now.

The one issue the job still has is that the task “fails” even when the raw_exec command succeeds. In other words: cURL runs, fetches the expected response, and saves that response to a file, but the task keeps getting a “failed” status and restarting.

But that’s obviously distinct from my initial issue, so, unless someone has an immediate insight, I figure I can work out for myself.

Again, thanks very much for the help!

Topic		Replies	Views
Nomad job fails with "prestart hook \"task_dir\" failed: mount: operation not permitted" Nomad	4	655	March 3, 2023
Nomad fail to do a volume_mount in windows container Nomad nomad	0	265	January 26, 2023
Runc cannot mount directory to container Nomad	0	644	September 13, 2022
Task API api.sock availability Nomad	2	211	February 8, 2024
Dockerized Consul & Nomad Cluster difficulties Nomad	2	434	November 8, 2024

An Issue with Nomad on Kubernetes: "Failed to mount shared directory for task"

Related topics