Nomad agent failing all of a sudden with errors pointing to nvidia

Hi all, I used to run this script that would launch nomad and some other stuff and it worked fine until yesterday. Now everything looks normal until gets to the line where it runs nomad agent -dev. Running this command now outputs the following:

==> Starting Nomad agent...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x11 pc=0x4066ce]

goroutine 1 [running]:
        _cgo_gotypes.go:170, 0xc000a1f200, 0x41253b) +0x54, 0x29026c0) +0x73*nvmlDriver).Initialize(...), 0x150, 0x150) +0x28, 0xc000124440, 0x30e7cb8, 0xc0007264b0, 0xc00035e000) +0x34, 0xc000124440, 0x30e7cb8, 0xc0007264b0, 0x0, 0x0) +0x49*PluginLoader).initInternal(0xc000726510, 0xc0007262d0, 0xc000726540, 0x0, 0x0, 0x0) +0x1dd*PluginLoader).init(0xc000726510, 0xc000a1f6c0, 0x2, 0x2) +0x87, 0x30ec578, 0xc0009a0f90, 0x30e7cb8) +0x45d*Agent).setupPlugins(0xc0004181e0, 0xc000311500, 0x0) +0x15f, 0x30ec578, 0xc0009a0f90, 0x3070ac0, 0xc0007762d0, 0xc0007c7d60, 0x0, 0x0, 0x2569200) +0x1fb*Command).setupAgent(0xc000967ce0, 0xc000313600, 0x30ec578, 0xc0009a0f90, 0x3070ac0, 0xc0007762d0, 0xc0007c7d60, 0x0, 0x2) +0xb0*Command).Run(0xc000967ce0, 0xc00004e1a0, 0x1, 0x1, 0x0) +0x4cc*CLI).Run(0xc000964b40, 0xc000964b40, 0xc00012dc98, 0x37) +0x41a
main.RunCustom(0xc00004e190, 0x2, 0x2, 0xc000060598) +0x4a7
main.main() +0x65

I don’t care about the GPU right now, so I tried putting this in my nomad.hcl:

plugin "nvidia-gpu" {
  config {
    enabled = false

But that doesnt seem to have solved it either. I tried un- and then re-installing nomad. No luck there either. I have no idea why this thing stopped working.

Hi @astolman,

Would you be able to provide the script you are using along with version information for Nomad?

jrasell and the Nomad team

The nomad version is 1.1.6, but I’ve tried rolling back all the way to 1.1.3 and that didn’t work. I’m not sure the rest of the script is relevant since this happens when I call nomad from the command line all on its own. It basically just launches a docker container first, and then brings up nomad.