Nomad OCI SHIM Runtime Error

Updated Nomad from Version 1.0.4 to 1.2.6 about a week ago;

We started seeing a lot of this error

find or create container on worker main-general-4bafd20a-nomad-c34-prod-7: starting task: new task: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:385: applying cgroup configuration for process caused: mkdir /sys/fs/nomad/shared: no such file or directory: unknown

Nomad clients

Client: Docker Engine - Community
 Version:           20.10.14
 API version:       1.41
 Go version:        go1.16.15
 Git commit:        a224086
 Built:             Thu Mar 24 01:47:57 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.14
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.15
  Git commit:       87a90dc
  Built:            Thu Mar 24 01:45:46 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.11
  GitCommit:        3df54a852345ae127d1fa3092b95168e4a88e2f8
 runc:
  Version:          1.0.3
  GitCommit:        v1.0.3-0-gf46b6ba
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Anyone see this?

Hey @OlaSegha

Did you ever figure out the cause of this error?

@OlaSegha Can you describe your environment more? What operating system, what kernel version, and the output of mount -l | grep cgroup may help.

The underlying error mkdir /sys/fs/nomad/shared: no such file or directory is very strange; it would appear the cgroup path is being computed incorrectly by docker or containerd on your system; the correct path should be /sys/fs/cgroup/cpuset/nomad/shared.

Did you unmount the cgroup controller on this system?

OS version = Ubuntu 20.04
Kernel = 5.4.0-109-generic

rpg@msp-nomad-client-prod-34:~$ mount -l | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)

Also note this only started when we upgraded to Nomad 1.2.6

Reposting my reply
OS=Ubuntu 20.04
Kernel= 5.4.0-109-generic

rpg@msp-nomad-client-prod-34:~$ mount -l | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)

This started after an upgrade to 1.2.6

A little more context …

we currently use nomad to manage the orchestration/deployment of concourse CI workers. Our nomad job file looks like this

job "concourse-workers-[[.concourse_team_name]]-[[.concourse_worker_group_name]]" {
    datacenters = ["cicd-vsphere"]

    type = "service"

    priority = 20

    constraint {
        attribute = "${node.class}"
        value = "[[.rpg_environment]]"
    }

    constraint {
        attribute = "${meta.service}"
        value = "concourse"
    }

    affinity {
        attribute = "${meta.concourse.job_type}"
        value = "worker"
        weight = 100
    }

    affinity {
        attribute = "${meta.instance_type}"
        value = "[[.nomad_instance_type]]"
        weight = 90
    }

    spread {
        attribute = "${meta.concourse.job_type}"
        target "worker" {
            percent = 100
        }
    }

    vault {
        policies = ["concourse"]
    }

    update {
        max_parallel      = 1
        canary            = 1
        health_check      = "checks"
        min_healthy_time  = "1m"
        healthy_deadline  = "3m"
        progress_deadline = "1h"
        auto_revert       = true
        auto_promote      = true
        stagger           = "30s"
    }

    group "worker" {

        count = [[.concourse_worker_instances]]

        restart {
            attempts = 0
            mode = "fail"
        }

        reschedule {
            delay          = "15s"
            delay_function = "fibonacci"
            max_delay      = "15m"
            unlimited      = true
        }

        ephemeral_disk {
            size = [[multiply .concourse_nomad_log_max_files .concourse_nomad_log_max_file_size | add 1 ]]
        }

        task "worker" {

            service {
                tags = [
                    "concourse",
                    "${NOMAD_TASK_NAME}",
                    "${NOMAD_JOB_NAME}"[[ if .concourse_include_tags ]],[[end]]
                    [[ if .concourse_include_tags ]]"[[ range $index, $value := .concourse_worker_tags ]][[if ne $index 0]],[[end]][[$value]][[end]]"[[end]]
                ]
                name = "${NOMAD_JOB_NAME}"
                port = "bind"

                check {
                    type = "tcp"
                    port = "healthcheck"
                    interval = "10s"
                    timeout = "5s"
                }
            }

            resources {
                cpu = [[.nomad_task_cpu]]
                memory = [[.nomad_task_memory]]
                network {
                    [[if .nomad_task_network]]mbits = [[.nomad_task_network]][[end]]
                    port "bind" {[[if .concourse_bind_port]]static = [[.concourse_bind_port]][[end]]}
                    port "baggageclaim" {[[if .concourse_baggageclaim_port]]static = [[.concourse_baggageclaim_port]][[end]]}
                    port "healthcheck" {[[if .concourse_healthcheck_port]]static = [[.concourse_healthcheck_port]][[end]]}
                    port "debug" {[[if .concourse_debug_port]]static = [[.concourse_debug_port]][[end]]}
                    port "bcdebug" {[[if .concourse_bcdebug_port]]static = [[.concourse_bcdebug_port]][[end]]}
                }
            }

            env {
                CONCOURSE_BAGGAGECLAIM_BIND_PORT = "${NOMAD_PORT_baggageclaim}"
                CONCOURSE_BAGGAGECLAIM_DEBUG_BIND_PORT = "${NOMAD_PORT_bcdebug}"
                CONCOURSE_BAGGAGECLAIM_DRIVER = "[[.concourse_baggageclaim_driver]]"
                CONCOURSE_BAGGAGECLAIM_LOG_LEVEL = "[[.concourse_baggageclaim_log_level]]"
                CONCOURSE_BIND_PORT = "${NOMAD_PORT_bind}"
                CONCOURSE_DEBUG_BIND_PORT = "${NOMAD_PORT_debug}"
                CONCOURSE_DEFAULT_BUILD_LOGS_TO_RETAIN = "[[.concourse_default_build_logs_to_retain]]"
                CONCOURSE_EPHEMERAL = "[[.concourse_ephemeral]]"
                CONCOURSE_HEALTHCHECK_BIND_PORT = "${NOMAD_PORT_healthcheck}"
                CONCOURSE_LOG_LEVEL = "[[.concourse_log_level]]"
                CONCOURSE_MAX_BUILD_LOGS_TO_RETAIN = "[[.concourse_max_build_logs_to_retain]]"
                CONCOURSE_RUNTIME = "[[.concourse_runtime]]"
                CONCOURSE_TSA_HOST = "[[.concourse_tsa_host]]"
                CONCOURSE_TSA_PUBLIC_KEY = "[[.concourse_tsa_public_key]]"
                CONCOURSE_TSA_WORKER_PRIVATE_KEY = "[[.concourse_tsa_worker_private_key]]"
                [[ if .concourse_main_team ]][[ else ]]CONCOURSE_TEAM = "[[.concourse_team_name]]"[[ end ]]
                [[ if .concourse_include_tags ]]
                CONCOURSE_TAGS = "[[ range $index, $value := .concourse_worker_tags ]][[if ne $index 0]],[[end]][[$value]][[end]]"
                [[end]]
                DEPLOY_TIMESTAMP = "[[.timestamp]]"
            }

            # Concourse templates - Concourse keys
            template {
                data = "{{with secret \"service/concourse/concourse-keys\"}}{{index .Data \"tsa_host_key.pub\"}}{{end}}"
                destination = "secrets/concourse-keys/tsa_host_key.pub"
                perms = "0400"
            }
            template {
                data = "{{with secret \"service/concourse/concourse-keys\"}}{{.Data.worker_key}}{{end}}"
                destination = "secrets/concourse-keys/worker_key"
                perms = "0400"
            }

            # Generate Concourse name and include rendered timestamp, to force allocation recreation on restart
            template {
                data = <<EOH
# Timestamp of deployment
CURRENT_TIMESTAMP="{{ timestamp }}"
# Unique Concourse name
CONCOURSE_NAME="[[.concourse_team_name]]-[[.concourse_worker_group_name]]-{{ env "NOMAD_ALLOC_ID" | regexReplaceAll "([^-]+)-.+" "$1" }}-{{ env "node.unique.name" }}-{{ env "NOMAD_ALLOC_INDEX" }}"
EOH

                destination = "secrets/concourse-name.env"
                change_mode = "restart"
                splay = "15s"
                env = true
            }

            driver = "docker"

            # The SIGUSR2 Signal will tell the worker to retire when receiving the kill signal
            kill_signal = "SIGUSR2"
            kill_timeout = "60m"

            config {
                image = "docker.artifactory.corp.code42.com/concourse/concourse:[[.concourse_version]]"
                privileged = true
                command = "worker"
                hostname = "${node.unique.name}"
                volumes = [
                    "secrets/concourse-keys:/concourse-keys"
                ]
            }
            logs {
                max_files     = [[.concourse_nomad_log_max_files]]
                max_file_size = [[.concourse_nomad_log_max_file_size]]
            }
        }
    }
}

We only started seeing this problem when we upgraded to 1.2.6; it also happens when we attempt to manually drain an allocation. The rescheduled allocation (the destination client) is where we see this error.