Java driver config not working

I have a brand new cluster and I am able to run Python jobs without issue using raw_exec and I am also able to use some basic java program using raw_exec but I am not sure how to get the Java driver to work . Or is that even needed ? Can someone point me to a good doc on what needs to be done in order get JAVA driver working on both Nomad Server and Client config

Nomad Server Config

plugin "raw_exec" {
    config {
    enabled    = true
    no_cgroups = true
      }
 }

Client

plugin "raw_exec" {
    config {
    enabled    = true
    no_cgroups = true
  }
 options {
        "driver.allowlist" = "exec,java,raw_exec"
      }

However If use a simple Java program with raw_exec it works fine but if I switch from

driver = "raw_exec"
      config {
        command = "/apps/bin/java"
        args    = ["-jar", "/tmp/xxxxx.jar"]
      }

to

driver = "java"
      config {
        command = "/apps/bin/java"
        args    = ["-jar", "/tmp/xxxxx.jar"]
      }

it fails with error when run and plan it from the UI

Constraint missing drivers filtered 1 node

How do we get JAVA driver to work ? I even tried adding the jar_path and jvm_options and artifact but still doesnt work .

Also output nomad node status <client_id>

nomad node status d3f254c1
ID              = xxxxx
Name            = xxx
Class           = <none>
DC              = xxx
Drain           = false
Eligibility     = eligible
Status          = ready
CSI Controllers = <none>
CSI Drivers     = <none>
Uptime          = 164h55m51s
Host Volumes    = <none>
Host Networks   = <none>
CSI Volumes     = <none>
Driver Status   = raw_exec

See the java task driver client requirements – can you share the part of the client configuration (nomad.hcl ) where you enable the java driver?

Hi @sammy676776, if you show the output of nomad node status -verbose <id> there should be a Drivers section showing which drivers have or have not been detected.

If this is a Linux machine, then I suspect the problem is that your java install is not in the default chroot. You’ll need to add /apps/bin/java to the chroot_env on the Client before it can be detected.

Hi!

I wrote a Medium article a little while back about the using the Java driver with Nomad. Check it out and see if this helps. I have a few full examples in there.

Here are some key points:

  • for the Nomad client setup, you just need to install Java on it (I used OpenJDK11)
  • jar is passed to the Java driver via jar_path
  • you need to use the artifact stanza to obtain your .jar file

My nomad client stanza to invoke Java

client {
      enabled = true
      gc_max_allocs = 100
      gc_interval = "10m"
      server_join {
      retry_join = ["xxxxx:4647", "xxxxx:4647", "xxxx:4647" ]
      }
      options {
        "driver.allowlist" = "exec,java,raw_exec"
      }
        chroot_env {
        "/bin" = "/bin"
        "/etc" = "/etc"
        "/lib" = "/lib"
        "/lib32" = "/lib32"
        "/lib64" = "/lib64"
        "/run/resolvconf" = "/run/resolvconf"
       "/sbin" = "/sbin"
        "/usr" = "/usr"
       "/apps/bin/java/" = "/apps/bin/java/"
        }
   }
    plugin "raw_exec" {
    config {
    enabled    = true
    no_cgroups = true
  }
  }

Please note when nomad comes up it does show this line

[INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
nomad node status -verbose <nodeid> output  for drivers which shows that Java is not healthy for some reason . 

Driver    Detected  Healthy  Message  Time
exec      true      true     Healthy  2023-03-09T15:25:08-05:00
java      false     false    <none>   2023-03-09T15:25:08-05:00
raw_exec  true      true     Healthy  2023-03-09T15:25:08-05:00

So JAVA still not available .

ok FIXED …basically the JAVA_HOME and JAVA_PATH were not being picked up even after defining it in unit files . Once I passed it where it involes the “/bin/nomad agent -config” it started working .

Drivers
Driver    Detected  Healthy  Message  Time
exec      true      true     Healthy  2023-03-09T15:56:51-05:00
java      true      true     Healthy  2023-03-09T15:56:51-05:00
raw_exec  true      true     Healthy  2023-03-09T15:56:51-05:00

However this brings back another old problem which we fixed by not running as ROOT :slight_smile: …which is when we run any task not on this client I get

client.alloc_runner.task_runner: prestart failed: alloc_id=93b9290e-a652-8378-f042-86f6bb4f099d task=webservice error="prestart hook \"task_dir\" failed: Failed to mount shared directory for task: operation not permitted"
2023-03-09T15:59:40.060-0500 [INFO]  client.alloc_runner.task_runner: not restarting task: alloc_id=93b9290e-a652-8378-f042-86f6bb4f099d task=webservice reason="Error was unrecoverable"

Which directory is it actually complaining about as it is running as root and entire directory structure is also owned by root

stat nomad_xxxx
evice: 29h/41d	Inode: 6462200812  Links: 4
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)

The Nomad Client needs to run as root. The directory structure of a Nomad Client in a production environment should look like,

[drwxr-xr-x root    ]  /opt/nomad
[drwx------ root    ]  ├── data
[drwx--x--x root    ]  │   ├── alloc
[drwx------ root    ]  │   └── client
[drwxr-xr-x root    ]  └── plugins

If you try to run the Client as non-root (which isn’t supported, but can be done with caveats and tweaks), then of course that structure would need different permissions.

Thanks @seth.hoenig . Right now I am running as ROOT and entire directory structure is root and as soon as I did that my JAVA driver issue got resolved however I still run into this task_dir problem for every job .

lient.alloc_runner.task_runner: prestart failed: alloc_id=93b9290e-a652-8378-f042-86f6bb4f099d task=webservice error="prestart hook \"task_dir\" failed: Failed to mount shared directory for task: operation not permitted"
2023-03-09T15:59:40.060-0500 [INFO]  client.alloc_runner.task_runner: not restarting task: alloc_id=93b9290e-a652-8378-f042-86f6bb4f099d task=webservice reason="Error was unrecoverable"

@seth.hoenig Thank you …I am getting close …switching "data_dir = “/opt/nomad” made it work …fingers crossed …lets see

Thanks for all the help so far @seth.hoenig @Neutrollized @brucellino1 .I hope my different issues will help someone in future . I still have one more issue left with this Java .I have a simple java program that write something to a file so I know when this job is successful . However raw_exec works and JAVA fails with Exit Code 127 .Any ideas? Same jar, file same hosts but diff results with diff driver

Java driver is loading fine on node

Drivers
Driver    Detected  Healthy  Message  Time
exec      true      true     Healthy  2023-03-09T19:01:22-05:00
java      true      true     Healthy  2023-03-09T19:01:22-05:00
raw_exec  true      true     Healthy  2023-03-09T19:01:22-05:00

If I use Java Driver I get the following error and job doesn’t compelte . Any ideas ?

nomad alloc status d43c54e1
ID                  = d43c54e1-f27a-d83a-d235-7cf2ba806337
Eval ID             = f67d9585
Name                = xxxxxx.cache[0]
Node ID             = 92628112
Node Name           = redacted
Job ID              = xxxxxxx
Job Version         = 0
Client Status       = pending
Client Description  = No tasks have started
Desired Status      = run
Desired Description = <none>
Created             = 1m ago
Modified            = 15s ago

Task "webservice" is "pending"
Task Resources
CPU        Memory       Disk     Addresses
0/100 MHz  0 B/300 MiB  300 MiB  

Task Events:
Started At     = 2023-03-11T21:01:59Z
Finished At    = N/A
Total Restarts = 3
Last Restart   = 2023-03-11T16:01:59-05:00

Recent Events:
Time                       Type                   Description
2023-03-11T16:01:59-05:00  Restarting             Task restarting in 15.319076743s
2023-03-11T16:01:59-05:00  Terminated             Exit Code: 127
2023-03-11T16:01:59-05:00  Started                Task started by client
2023-03-11T16:01:38-05:00  Restarting             Task restarting in 17.944166401s
2023-03-11T16:01:38-05:00  Terminated             Exit Code: 127
2023-03-11T16:01:38-05:00  Started                Task started by client
2023-03-11T16:01:36-05:00  Restarting             Task restarting in 17.543176045s
2023-03-11T16:01:36-05:00  Terminated             Exit Code: 127
2023-03-11T16:01:36-05:00  Started                Task started by client
2023-03-11T16:01:33-05:00  Downloading Artifacts  Client is downloading artifacts

Working raw_exec code

      		driver = "raw_exec"
      config {
        command = "/apps/java/bin/java"
        args    = ["-jar", "/tmp/simple.jar"]
      }
}

Failing JAVA driver

task "webservice" {
      driver = "java"
      config {
        jar_path    = "/tmp/simple.jar"
        jvm_options = ["-Xmx2048m", "-Xms256m"]
      }
      artifact {
        source = "https://xxxxx:4444/simple.jar"

      }
}

To summarize same hosts same jar file raw_exec works and JAVA fails with Exit Code 127 .

Is there a way to make the node pick up the right java while using JAVA driver in the job ? There are multiple flavors of java and maybe we need to explicitly mention that in the job ?

What do the logs say?

2023-03-13T12:42:33-04:00  Not Restarting  Exceeded allowed attempts 3 in interval 24h0m0s and mode is "fail"
2023-03-13T12:42:33-04:00  Terminated      Exit Code: 127
2023-03-13T12:42:33-04:00  Started         Task started by client
2023-03-13T12:42:12-04:00  Restarting      Task restarting in 18.411032403s
2023-03-13T12:42:12-04:00  Terminated      Exit Code: 127
2023-03-13T12:42:12-04:00  Started         Task started by client
2023-03-13T12:41:54-04:00  Restarting      Task restarting in 15.690231165s
2023-03-13T12:41:54-04:00  Terminated      Exit Code: 127
2023-03-13T12:41:54-04:00  Started         Task started by client
2023-03-13T12:41:51-04:00  Restarting      Task restarting in 17.910123115s

@sammy676776 those are Task Events, I’m asking about the logs generated by the task itself

I dont see any logs
nomad alloc logs -job just hangs and if I try
nomad alloc logs -job

So I went to “/opt/nomad/alloc/e4021037-ae67-1a6c-67da-611c2e18154e/webservice/alloc/logs”

more webservice.stderr.0
/apps/jdk-11.0.18/bin/java: error while loading shared libraries: libjli.so: cannot open shared object file: No such file or directory
/apps/jdk-11.0.18/bin/java: error while loading shared libraries: libjli.so: cannot open shared object file: No such file or directory

do you think the classpath is not being picked up ?

task "webservice" {
      driver = "java"
      config {
        jar_path    = "local/blah.jar"
        jvm_options = ["-Xmx2048m", "-Xms256m"]
        class_path = "/apps/jdk-11.0.18/"
      }

Can you remove the /tmp in the jar_path field and just have the jar file name in there?

i.e.

task "webservice" {
      driver = "java"
      config {
        jar_path    = "simple.jar"
        jvm_options = ["-Xmx2048m", "-Xms256m"]
      }
      artifact {
        source = "https://xxxxx:4444/simple.jar"
        mode   = "file"
      }
}

If you normally don’t need to pass in a classpath when you run it, then you shouldn’t need to specify it in your jobspec either.

Tried that too . Taking out “chroot” and putting it back in gives diff errors . Current error

Driver Failure	failed to launch command with executor: rpc error: code = Unknown desc = file /apps/jdk-11.0.18/bin/java not found under path /opt/nomad/alloc/a23ac027-0c43-08bb-a02f-4fa4d0f3a5b7/raw

nomad node status nodeid shows JAVA driver is fine and loaded

Drivers
Driver    Detected  Healthy  Message  Time
exec      true      true     Healthy  2023-03-14T14:59:30-04:00
java      true      true     Healthy  2023-03-14T14:59:30-04:00
raw_exec  true      true     Healthy  2023-03-14T14:59:30-04:00

chroot_env {
        "/bin" = "/bin"
        "/etc" = "/etc"
        "/lib" = "/lib"
        "/lib32" = "/lib32"
        "/lib64" = "/lib64"
        "/run/resolvconf" = "/run/resolvconf"
        "/sbin" = "/sbin"
        "/usr" = "/usr"
        }

It is something or the other with JAVA driver. The same job with some changes works with RAW_EXEC . At a point where I am questioning if there is any benefit to run with JAVA driver instead of RAW_EXEC which works and probably more used than the others .

I am going to use raw_exec and give up on JAVA driver as it is quite unstable in our environment and already spent so much time trying to set it up ! . I am happy to work with any Hashicorp or other folks in this group if they are willing to help fix this issue for other customers as I can reproduce this JAVA driver failure quite easily with couple of simple jar files .