Nomad OCI SHIM Runtime Error

OlaSegha · April 4, 2022, 1:18pm

Updated Nomad from Version 1.0.4 to 1.2.6 about a week ago;

We started seeing a lot of this error

find or create container on worker main-general-4bafd20a-nomad-c34-prod-7: starting task: new task: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:385: applying cgroup configuration for process caused: mkdir /sys/fs/nomad/shared: no such file or directory: unknown

Nomad clients

Client: Docker Engine - Community
 Version:           20.10.14
 API version:       1.41
 Go version:        go1.16.15
 Git commit:        a224086
 Built:             Thu Mar 24 01:47:57 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.14
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.15
  Git commit:       87a90dc
  Built:            Thu Mar 24 01:45:46 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.11
  GitCommit:        3df54a852345ae127d1fa3092b95168e4a88e2f8
 runc:
  Version:          1.0.3
  GitCommit:        v1.0.3-0-gf46b6ba
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Anyone see this?

Amier · April 18, 2022, 2:15pm

Hey @OlaSegha

Did you ever figure out the cause of this error?

seth.hoenig · April 19, 2022, 4:23pm

@OlaSegha Can you describe your environment more? What operating system, what kernel version, and the output of mount -l | grep cgroup may help.

The underlying error mkdir /sys/fs/nomad/shared: no such file or directory is very strange; it would appear the cgroup path is being computed incorrectly by docker or containerd on your system; the correct path should be /sys/fs/cgroup/cpuset/nomad/shared.

Did you unmount the cgroup controller on this system?

OlaSegha · May 11, 2022, 2:00pm

OS version = Ubuntu 20.04
Kernel = 5.4.0-109-generic

rpg@msp-nomad-client-prod-34:~$ mount -l | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)

Also note this only started when we upgraded to Nomad 1.2.6

OlaSegha · May 11, 2022, 2:06pm

Reposting my reply
OS=Ubuntu 20.04
Kernel= 5.4.0-109-generic

rpg@msp-nomad-client-prod-34:~$ mount -l | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)

This started after an upgrade to 1.2.6

OlaSegha · May 13, 2022, 8:10pm

A little more context …

we currently use nomad to manage the orchestration/deployment of concourse CI workers. Our nomad job file looks like this

job "concourse-workers-[[.concourse_team_name]]-[[.concourse_worker_group_name]]" {
    datacenters = ["cicd-vsphere"]

    type = "service"

    priority = 20

    constraint {
        attribute = "${node.class}"
        value = "[[.rpg_environment]]"
    }

    constraint {
        attribute = "${meta.service}"
        value = "concourse"
    }

    affinity {
        attribute = "${meta.concourse.job_type}"
        value = "worker"
        weight = 100
    }

    affinity {
        attribute = "${meta.instance_type}"
        value = "[[.nomad_instance_type]]"
        weight = 90
    }

    spread {
        attribute = "${meta.concourse.job_type}"
        target "worker" {
            percent = 100
        }
    }

    vault {
        policies = ["concourse"]
    }

    update {
        max_parallel      = 1
        canary            = 1
        health_check      = "checks"
        min_healthy_time  = "1m"
        healthy_deadline  = "3m"
        progress_deadline = "1h"
        auto_revert       = true
        auto_promote      = true
        stagger           = "30s"
    }

    group "worker" {

        count = [[.concourse_worker_instances]]

        restart {
            attempts = 0
            mode = "fail"
        }

        reschedule {
            delay          = "15s"
            delay_function = "fibonacci"
            max_delay      = "15m"
            unlimited      = true
        }

        ephemeral_disk {
            size = [[multiply .concourse_nomad_log_max_files .concourse_nomad_log_max_file_size | add 1 ]]
        }

        task "worker" {

            service {
                tags = [
                    "concourse",
                    "${NOMAD_TASK_NAME}",
                    "${NOMAD_JOB_NAME}"[[ if .concourse_include_tags ]],[[end]]
                    [[ if .concourse_include_tags ]]"[[ range $index, $value := .concourse_worker_tags ]][[if ne $index 0]],[[end]][[$value]][[end]]"[[end]]
                ]
                name = "${NOMAD_JOB_NAME}"
                port = "bind"

                check {
                    type = "tcp"
                    port = "healthcheck"
                    interval = "10s"
                    timeout = "5s"
                }
            }

            resources {
                cpu = [[.nomad_task_cpu]]
                memory = [[.nomad_task_memory]]
                network {
                    [[if .nomad_task_network]]mbits = [[.nomad_task_network]][[end]]
                    port "bind" {[[if .concourse_bind_port]]static = [[.concourse_bind_port]][[end]]}
                    port "baggageclaim" {[[if .concourse_baggageclaim_port]]static = [[.concourse_baggageclaim_port]][[end]]}
                    port "healthcheck" {[[if .concourse_healthcheck_port]]static = [[.concourse_healthcheck_port]][[end]]}
                    port "debug" {[[if .concourse_debug_port]]static = [[.concourse_debug_port]][[end]]}
                    port "bcdebug" {[[if .concourse_bcdebug_port]]static = [[.concourse_bcdebug_port]][[end]]}
                }
            }

            env {
                CONCOURSE_BAGGAGECLAIM_BIND_PORT = "${NOMAD_PORT_baggageclaim}"
                CONCOURSE_BAGGAGECLAIM_DEBUG_BIND_PORT = "${NOMAD_PORT_bcdebug}"
                CONCOURSE_BAGGAGECLAIM_DRIVER = "[[.concourse_baggageclaim_driver]]"
                CONCOURSE_BAGGAGECLAIM_LOG_LEVEL = "[[.concourse_baggageclaim_log_level]]"
                CONCOURSE_BIND_PORT = "${NOMAD_PORT_bind}"
                CONCOURSE_DEBUG_BIND_PORT = "${NOMAD_PORT_debug}"
                CONCOURSE_DEFAULT_BUILD_LOGS_TO_RETAIN = "[[.concourse_default_build_logs_to_retain]]"
                CONCOURSE_EPHEMERAL = "[[.concourse_ephemeral]]"
                CONCOURSE_HEALTHCHECK_BIND_PORT = "${NOMAD_PORT_healthcheck}"
                CONCOURSE_LOG_LEVEL = "[[.concourse_log_level]]"
                CONCOURSE_MAX_BUILD_LOGS_TO_RETAIN = "[[.concourse_max_build_logs_to_retain]]"
                CONCOURSE_RUNTIME = "[[.concourse_runtime]]"
                CONCOURSE_TSA_HOST = "[[.concourse_tsa_host]]"
                CONCOURSE_TSA_PUBLIC_KEY = "[[.concourse_tsa_public_key]]"
                CONCOURSE_TSA_WORKER_PRIVATE_KEY = "[[.concourse_tsa_worker_private_key]]"
                [[ if .concourse_main_team ]][[ else ]]CONCOURSE_TEAM = "[[.concourse_team_name]]"[[ end ]]
                [[ if .concourse_include_tags ]]
                CONCOURSE_TAGS = "[[ range $index, $value := .concourse_worker_tags ]][[if ne $index 0]],[[end]][[$value]][[end]]"
                [[end]]
                DEPLOY_TIMESTAMP = "[[.timestamp]]"
            }

            # Concourse templates - Concourse keys
            template {
                data = "{{with secret \"service/concourse/concourse-keys\"}}{{index .Data \"tsa_host_key.pub\"}}{{end}}"
                destination = "secrets/concourse-keys/tsa_host_key.pub"
                perms = "0400"
            }
            template {
                data = "{{with secret \"service/concourse/concourse-keys\"}}{{.Data.worker_key}}{{end}}"
                destination = "secrets/concourse-keys/worker_key"
                perms = "0400"
            }

            # Generate Concourse name and include rendered timestamp, to force allocation recreation on restart
            template {
                data = <<EOH
# Timestamp of deployment
CURRENT_TIMESTAMP="{{ timestamp }}"
# Unique Concourse name
CONCOURSE_NAME="[[.concourse_team_name]]-[[.concourse_worker_group_name]]-{{ env "NOMAD_ALLOC_ID" | regexReplaceAll "([^-]+)-.+" "$1" }}-{{ env "node.unique.name" }}-{{ env "NOMAD_ALLOC_INDEX" }}"
EOH

                destination = "secrets/concourse-name.env"
                change_mode = "restart"
                splay = "15s"
                env = true
            }

            driver = "docker"

            # The SIGUSR2 Signal will tell the worker to retire when receiving the kill signal
            kill_signal = "SIGUSR2"
            kill_timeout = "60m"

            config {
                image = "docker.artifactory.corp.code42.com/concourse/concourse:[[.concourse_version]]"
                privileged = true
                command = "worker"
                hostname = "${node.unique.name}"
                volumes = [
                    "secrets/concourse-keys:/concourse-keys"
                ]
            }
            logs {
                max_files     = [[.concourse_nomad_log_max_files]]
                max_file_size = [[.concourse_nomad_log_max_file_size]]
            }
        }
    }
}

We only started seeing this problem when we upgraded to 1.2.6; it also happens when we attempt to manually drain an allocation. The rescheduled allocation (the destination client) is where we see this error.

Topic		Replies	Views
Nomad plugin server error (use of closed network connection) And client: task “api” for alloc “X” failed: wait exit code 1 Nomad	5	531	April 6, 2020
I am running nomad on Raspbery PI Ubuntu 20.04 LTS server 64bit Nomad	0	1384	December 20, 2020
Nomad + Consul Connect Nomad	2	1423	September 16, 2019
An Issue with Nomad on Kubernetes: "Failed to mount shared directory for task" Nomad	2	508	February 16, 2023
Simple job failed MariaDB Nomad	5	2117	December 11, 2020

Nomad OCI SHIM Runtime Error

Related topics