Advise on a infrastructure

Hi, we are currently looking for some orchestration to our game services (we are a game-developer company) and have a couple of questions, related to a right architecture in terms of solving our tasks.

Currently we have:

  • 3 regions (EU/US/ASIA)
  • All regions have a different bare-metal servers
  • Dockerized game-servers (C# Unity)
  • Every server have running: N containers of game-servers + 1 cadvisor container

Our current deployment is done through ansible and pretty ineffective, and we looking forward to use Nomad. But there is a couple of questions to solve, maybe someone could share us a good way to do things.

Questions:

  • What is a right way to let’s say have a two/three/four container (game-server) versions running in production? Splitting all things up into task-groups?
  • How we can bind every running container to a single CPU core? task docker + hcl?
  • Having 3 regions we should have in every region a 3 Nomad servers running? Should be this machine only for Nomad server? What prereq of a CPU/Mem is enough for this machine?
1 Like

Hi @m1ome!

It really depends on what isolation and performance requirements you have. If you’re deploying N identical game-servers, then you could deploy them as a single task group with a count field. And then the cadvisor task as a separate job entirely.

If you’re using the Docker task driver, you can use the cpuset_cpus. Most other task drivers don’t support this, although we have an open issue nomad/#2303 for it.

You might want to take a look at the Reference Architecture and Requirements documentation. But in short, the Nomad servers for a single cluster (“region” in Nomad’s terminology) need to be in the same region. Depending on your reliability requirements you may be able to get away with having the Nomad client nodes running in far-away regions.

You can also federate regions to make managing multiple regions easier to deal with. And there’s a Nomad Enterprise feature for deploying a single job across multiple regions.

Hi! Thanks for a quick response, a lot of things now looks more clear!

But in terms of cpuset_cpus there is a dynamic port feature, and let’s say i wanna spawn 10 containers and bind them do a separate CPU core each, is this somehow possible while not using HCL?

And about groups, you mean if let’s say i wanna have a 3 version of servers i should create 3 separate task group in a job?

But in terms of cpuset_cpus there is a dynamic port feature, and let’s say i wanna spawn 10 containers and bind them do a separate CPU core each, is this somehow possible while not using HCL?

Not currently. If you want to spawn 10 containers on the same host, all of which have their own cpu, you’ll need to create 10 tasks in that task group. You can use HCL2 dynamic blocks to make this a little easier.

And about groups, you mean if let’s say i wanna have a 3 version of servers i should create 3 separate task group in a job ?

The way to look at is is:

  • job is the unit of deployment: the whole job is deployed when you do a nomad job run.
  • group is the unit of placement: each count of a task group will create 1 allocation (a collection of containers that run on the same host and share resources).
  • task is what’s actually running: a single container.

So you could do:

job "job" {
  group "game-servers" {
    count = 10
    task "game-server" {}
    task "logging-sidecar" {}
  }

  group "cadvisor" {
    task "cadvisor"
  } 
}

That would result in 11 allocations:

  • 10 allocations, each with a game server container + a logger container
  • 1 cadvisor allocation with just the one advisor container

But you could also split those two groups into different jobs, depending on what the lifecycle of your deployments is likely to be.

You mean i should do it in e.g. in a task group without a count, manually or with HCL creating N tasks in a group?

And in e.g. if let’s say i have on a every client i know it’s (host) overall capacity (in e.g. i have 2 nodes with a Nomad client on them, one can handle 10 server and other can handle 20) best way to do it - is somehow get this data from a client (provision client on them with some sort of metadata). And i have 3 version need to roll out A/B and C i should create 3 task group in each of them i can divide a client overall capacity to a percentage related to every version, and using HCL to spin up all needed containers?

I am just trying to shape-up the whole idea. Sorry if it looks like i am trying to ask dumb questions.

You mean i should do it in e.g. in a task group without a count, manually or with HCL creating N tasks in a group?

Right, if you knew you wanted 10 tasks together on the same host, that would be 1 task group with count = 1 and 10 tasks. Usually you won’t want to try to “manually” bin pack like this though (see below).

And in e.g. if let’s say i have on a every client i know it’s (host) overall capacity (in e.g. i have 2 nodes with a Nomad client on them, one can handle 10 server and other can handle 20) best way to do it - is somehow get this data from a client (provision client on them with some sort of metadata).

Nomad fingerprints the resources available on each node and either binpacks onto the hosts (by default) or spreads the allocations for a job across as many hosts as possible (using spread). Generally speaking you’ll want to tell Nomad how much cpu/memory the application needs, and Nomad is supposed to figure out the placements for you. There’s usually no need to manually specify resource-related metadata or try to do fine-grained control over placements. You can declare affinity or constraints to give Nomad hints about placement.

The exception you may be running into is cpu pinning, because that feature is a bit under-developed and it’s not part of the host fingerprinting information. In that case you’ll probably need to use dynamic blocks and some metadata to help placement.

So in the example job below, you end up with 10 allocations, each landing on a different “large” host. Each of those allocations has 5 tasks, and each task is pinned to a specific CPU. As you can see, there are a whole lot of options for influencing scheduling.

variables {
  task_ids = [0,1,2,3,4]
}

job "gameservers-large-hosts" { 

  group "gameservers" {
    count = 10

    constraint {
        operator  = "distinct_hosts"
        value     = "true"
     }

    constraint {
      attribute = "${meta.host_class}"
      value     = "large"
    }

    dynamic "task" {
      for_each = var.task_ids
      labels = [task_ids.value]
      driver = "docker"
      config {
        cpuset_cpu = task_ids.value
      }
    }
  } 

}


This should be affinity as i understant and it means Noman should bind up with a specific hosts based on a constraints?

One more question related to this one sentence. 10 allocations and 5 task each, if we have a 5 hosts (clients) with a large meta tag it will spin up 50 docker containers on each host? And let’s we have a different bare-metal servers and want to explicit say. Spin up this task on each server N docker containers where N is, let’s say ${meta.<group_identifier>} (btw is there any way to get from a meta something based on a name of current task group?) it will look like something like this?

job "gameservers-first-cluster" { 
    count = 1

    affinity {
        operator  = "distinct_hosts"
        value     = "true"
     }

    constraint {
      attribute = "${meta.host_class}"
      value     = "gameservers-first-cluster"
    }

    variable "task_ids" {
        type    = list("${meta.host_capacity}")
    }

    dynamic "task" {
      image = "game-server:${env["SERVER_VERSION"]}"
      for_each = var.task_ids
      labels = [var.task_ids.value]
      driver = "docker"
      config {
        cpuset_cpu = var.task_ids.value
      }
    }
  } 

This should be affinity as i understant and it means Noman should bind up with a specific hosts based on a constraints ?

Affinity encourages allocations to land on the same host. So in the example I gave I used to the distinct_hosts constraint so that one allocation would land on each host.

10 allocations and 5 task each, if we have a 5 hosts (clients) with a large meta tag it will spin up 50 docker containers on each host?

That’s why I had the distinct_hosts constraint, but if you use that constraint with count = 10 and 5 hosts, the scheduler will reject it because the constraint can’t be met.

For the jobspec you gave, unfortunately you can’t do this:

variable "task_ids" {
        type    = list("${meta.host_capacity}")
    }

Dynamic variables are processed in the CLI, but the meta value for hosts isn’t known until after placement.