How to run a raw_exec task as non-root user

Hello,

It is possible to run processes for the raw_exec task as non-root users? Ideally want to run nomad as root and then do something like this:

job "show_id_job" {
  datacenters = ["dc1"]
  priority = 100
  type = "batch"
  constraint {
    attribute = "${attr.unique.hostname}"
    value = "myhost.company.com"
  }
  group "show_id_group" {
    network {
      mode = "host"
    }
    task "show_id" {
      driver = "raw_exec"
      config {
        command = "/usr/bin/su"
        args = ["--login", "regularuser", "/usr/bin/id"]
      }
    }
  }
}

But when I run this job it fails:

Oct 23 19:51:03 myhost.company.com nomad[300160]: client: allocation updates applied: added=0 removed=0 updated=4 ignored=4 errors=0
Oct 23 19:51:03 myhost.company.com su[385531]: pam_unix(su-l:session): session closed for user regularuser
Oct 23 19:51:03 myhost.company.com nomad[300160]:     2020-10-23T19:51:03.822-0400 [ERROR] client.driver_mgr.raw_exec: error receiving stream from Stats executor RPC, closing stream: alloc_id=fbe2e6d9-930e-acff-83c7-9d0f83b2e085 driver=raw_exec task_name=show_id error="rpc error: code = Unavailable desc = transport is closing"
Oct 23 19:51:03 myhost.company.com nomad[300160]:     2020-10-23T19:51:03.822-0400 [ERROR] client.alloc_runner.task_runner.task_hook.stats_hook: failed to start stats collection for task: alloc_id=fbe2e6d9-930e-acff-83c7-9d0f83b2e085 task=show_id error="rpc error: code = Canceled desc = grpc: the client connection is closing"

I could not find in the documentation any parameters that could allow me to do the same

Has anyone run into this issue?

Thanks!

Could this help?

*** I am not sure if raw_exec supports user (haven’t recently experimented with the user parameter)

Hello Shantanu,

Sadly user not supported for raw_exec.

job "show_id_job" {
  datacenters = ["dc1"]
  priority = 100
  type = "batch"
  constraint {
    attribute = "${attr.unique.hostname}"
    value = "weelxavt017d.striketechnologies.com"
  }
  group "show_id_group" {
    network {
      mode = "host"
    }
    task "show_id" {
      driver = "raw_exec"
      config {
        user = "regularuser"
        command = "/usr/bin/id"
        args = ["--name", "--user"]
      }
    }
  }
}

From journalctl:

Oct 24 15:25:14 myhost.company.com nomad[829897]: client.alloc_runner.task_runner: running driver failed: alloc_id=b879749f-b1b9-a65a-e621-f4c6fe49d609 task=show_id error="2 errors occurred:
                                                                           * failed to parse config:
                                                                           * Invalid label: No argument or block type is named "user".

Documentation says it is supported for driver=exec or driver=docker (easy Docker has direct support for this, even at the image build level). I can change my task to be restricted with cgroups but then it takes forever to run the allocation as the exec tasks copies data from the original locations into the chrooted locations (you control that by adding more directories into the client chroot_env section on /etc/nomad/conf.hcl file). My new file with exec

job "show_id_job" {
  datacenters = ["dc1"]
  priority = 100
  type = "batch"
  constraint {
    attribute = "${attr.unique.hostname}"
    value = "myhost.company.com"
  }
  group "show_id_group" {
    network {
      mode = "host"
    }
    task "show_id" {
      driver = "exec"
      config {
        user = "regularuser"
        command = "/usr/bin/id"
        args = ["--name", "--user"]
      }
    }
  }
}

So at this point seems than the best way to run nomad is as the target user from day 0 and do not bother to use exec as it will copy (not link) the chrooted directories. I’m still skimming through the documentation to make sure I’m not missing something else that can speed up the task startup or even let it run :slight_smile:

I am curious why raw_exec doesn’t have the capability of dropping user?

In case anyone from @hashicorp is reading this … would it be possible to clarify, thanks?

Just tagging a few usual suspects! :grinning:

@preetapan @tgross @angrycub @schmichael

I had created an issue long ago …

Now I am wondering if Nomad’s raw_exec did support user once upon a time!?! :thinking:

Oh, this is interesting. Yeah, it is a bummer is not supported.

As a manual workaround you coud use either setpriv (if you don’t want/have to go through a dedicated PAM session handling), or either runuser or a plain su as a prefix to the command.
It’s kind of messy OTOH, but it could work (never tested it myself in the context of nomad tbh).

I think it indeed works, just not documented correctly! :smile:

Hello Matya,

I tried with ‘su’ and it caused an error, the task execution did not work. I’ll explore the ‘setpriv’ or ‘runuser’ options to see if they are useful on this case.

Thanks for your help.

I would be nice is someone from Hashicorp could clarify this. So far our group is liking K8s more and more because of little things like this :slight_smile:

i don’t have a sample raw_exec right now … can you just check if specifying user at the task level works?

Yep, it’s at the beginning of my post. Broken…

@jnunezgts just realized when I read the OP (and subsequent) in detail…

raw_exec does indeed work with user.

The user should be specified inside the task block, not inside the config block.

Oh. Well, that’s nice actually. I will give it a try and comment here.

Starts to smell like a classic “RTFM” error (Read The Fine Manual).

1 Like

HI @jnunezgts there is one more minor issue that I faced … the Nomad data_dir needs to have exec bit set, so my following job was working on some nodes … but not others …

clue: https://github.com/hashicorp/nomad/issues/1919

I see a scary error message:

Driver Failure  failed to launch command with executor: rpc error: code = Unknown desc = failed to start command path="/var/lib/nomad/alloc/d94090cc-3ee1-4fa8-c6c6-002e35df3bb8/redis/local/runme.bash" --- args=["/var/lib/nomad/alloc/d94090cc-3ee1-4fa8-c6c6-002e35df3bb8/redis/local/runme.bash"]: fork/exec /var/lib/nomad/alloc/d94090cc-3ee1-4fa8-c6c6-002e35df3bb8/redis/local/runme.bash: permission denied

After a chmod 0755 /var/lib/nomad on all my nodes, the jobs works fine now.

job "example" {
  datacenters = ["dc1"]
  type        = "batch"

  constraint {
    attribute = "${node.class}"
    value     = "special-node"
  }

  group "cache" {
    task "redis" {
      driver = "raw_exec"
      user   = "special-user"

      template {
        data = <<EOF
#!/bin/bash

set -u
set -e
set -x

echo 2>&1

sleep 5
sync

env | sort
hostname
id -a

touch /tmp/something.txt

ls -l /tmp/something.txt

echo "here"
sync

sleep 60
exit 0
EOF

        destination = "local/runme.bash"
        perms       = "755"
      }

      config {
        command = "local/runme.bash"

        #command = "/bin/bash"
        #args    = ["-c", "local/runme.bash"]

        #command = "/usr/bin/id"
        #args    = ["-a"]
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

HTH

1 Like

Yeah, well, if switching user you are still having the environment and directories as they were, so as long the resulting user has no “x” for traverse on each of the patent directories up to where the alloc lives, it will fail with access denied, but that’s more or less a feature of Linux than an issue that can be fixed in nomad without some nasty workarounds or hack IMHO.

The regular exec could do something about it since it’s using chroot, but not sure how exactly that works under the hood without reading into the code itself (I use docker for the moment).

Nevertheless I recommend not using su or sudo when not requiring a dedicated session for the user but “just” dropping privileges. Many real programming languages support changing uid/euid when run as root.

(but please don’t set anything on chmod 777,thats a very bad thing to do on any of the nomad directories. I recommend working with group level privileges for restricting access to the alloc folders)

1 Like

+1 … so I made it 0755