Dimension cpu exhaused on 3 nodes (all the nodes I have!)

I have two very similar jobs, and after deploying one, the other will always fail with Dimension cpu exhaused on 3 nodes. Oddly enough, I’m only deploying 2 containers onto my docker nomad cluster per service. After after the first service gets deployed, the second will not deploy at all!

Unfortunately, the Nomad UI doesn’t allow me to copy definitions? I would have hoped that I could have copy-pasted them for help, but that doesn’t seem to be possible with the setup of Nomad currently. Here’s the template that I am using to deploy the service. Keep in mind this is getting formatted by Ansible and the variables are being plugged in as appropriate:

job "benthos-{{ alm_app }}" {
  datacenters = ["us-ind-{{ alm_env }}"]
  type = "service"
  group "benthos-{{ alm_app }}" {
    count = 2
    network {
      mode = "bridge"
        port "stream" {
          to = 4195
        }
    }
    task "benthos-{{ alm_app }}" {
      vault {
        policies = ["deployer"]
      }
      resources {
        cores = {{ benthos_resource_cores[alm_env] }}
        memory = {{ benthos_resource_memory[alm_env] }}
      }
      driver = "docker"
      config {
        image = "{{ benthos_config_image }}:{{ benthos_config_version[alm_env] }}"
        # This overrides the secrets.yml file that is already there.
        volumes = [
          "local/infrastructure.yml:/infrastructure.yml"
        ]
      }
      template {
        destination = "${NOMAD_SECRETS_DIR}/secrets.yml"
        left_delimiter = "[["
        right_delimiter = "]]"
        data = <<EOF
---
{% for benthos_secret in benthos_secrets %}
[[ with secret "secret/data/{{ benthos_secret['vault_path'] }}" ]]
{{ benthos_secret['var_file_key'] }}: "[[ .Data.data.{{ benthos_secret['vault_key'] }} ]]"
[[ end ]]
{% endfor %}
        EOF
      }
      template {
        left_delimiter = "[["
        right_delimiter = "]]"
        destination = "local/infrastructure.yml"
        # So if we remove the quotes, jinja should format the 'val' part of this
        # correctly regardless of whether this is a string, dict, or list
        data = <<EOF
---
{% for benthos_component in benthos_infrastructure %}
{{ benthos_component['key'] }}: {{ benthos_component['val'] }}
{% endfor %}
        EOF
      }
      env {
        UPDOX_ENV = "{{ alm_env }}"
        UPDOX_LOC = "{{ alm_loc }}"
        UPDOX_APP = "benthos-{{ alm_app }}"
      }
      service {
        name = "benthos-{{ alm_app }}"
        port = "stream"
        provider = "consul"
      }
    }
  }
}

Keep in mind that EACH OF THESE DEPLOYS WORK, as long as the other isn’t running. So what is stepping on which toes?

Hi @acziryak,

Dimension cpu exhaused on 3 nodes

This indicates that once the first job has been registered and scheduled on your cluster, there is not enough allocatable CPU available to schedule the second job. Typically in order to resolve this problem, you would either need to run more nodes, or run nodes with more available CPU that can accommodate the required resources. In this case, the job you have detailed only has a group count of 2, which makes me think something else is happening. It would be interesting to see the output of nomad node status -verbose <nodeID> for all three nodes.

the Nomad UI doesn’t allow me to copy definitions

You should be able to see the job specification definitions at a location similar to http://localhost:4646/ui/jobs/<jobID>@<namespace>/definition being sure to account for your cluster information in the URL. You can also use the nomad job inspect <jobID> to view the job definition from the CLI.

Thanks,
jrasell and the Nomad team

1 Like

This looks to have been the case. I’ve changed the CPU allocation from cores: 2 to cpu: 500, which significantly reduced the requirements of each container. This in turn led to freeing up a significant amount of resources from Nomad’s point of view, and it was able to allocate all the containers based on those new requirements.