Encountering RPC Errors and 500s

Hi everyone,

Wondering if anyone could help shed some light on RPC errors that I’m seeing from Nomad clients.

When running a job, the allocation eventually fails with the error:

 Missing: nomad.var.block(nomad/jobs/api@default.global) 

I do have variables defined under the path nomad/jobs/api. On looking into the Nomad client logs, I see the following which I assume is related:

Dec 28 15:55:43 api-server-0 nomad[262171]:     2023-12-28T15:55:43.911Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: msgpack decode error [pos 712]: invalid byte descriptor for decoding bytes, got: 0xd2" rpc=AC>
Dec 28 15:55:43 api-server-0 nomad[262171]:     2023-12-28T15:55:43.911Z [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: msgpack decode error [pos 712]: invalid byte descrip>
Dec 28 15:55:43 api-server-0 nomad[262171]:     2023-12-28T15:55:43.911Z [ERROR] http: error authenticating built API request: error="rpc error: msgpack decode error [pos 712]: invalid byte descriptor for decoding bytes, got: 0xd2" url=>
Dec 28 15:55:43 api-server-0 nomad[262171]:     2023-12-28T15:55:43.912Z [WARN]  agent: (view) nomad.var.block(nomad/jobs/api@default.global): Unexpected response code: 500 (Server error authenticating request) 

I believe the 500 response is from the Nomad server, but I could be wrong.

Here’s my Nomad server configuration.

data_dir   = "/opt/nomad/data"
bind_addr  = "10.0.0.3"

server {
  enabled          = true
  bootstrap_expect = 3
  server_join {
    retry_join     = ["10.0.0.5","10.0.0.2","10.0.0.3"]
    retry_max      = 0
    retry_interval = "15s"
  }
}

consul {
  grpc_ca_file = "/etc/consul.d/tls/consul-agent-ca.pem"
  grpc_address = "127.0.0.1:8503"
  ca_file      = "/etc/consul.d/tls/consul-agent-ca.pem"
  ssl          = true
  verify_ssl   = false

  token   = ""
  address = "127.0.0.1:8501"

  auto_advertise   = true
  server_auto_join = true
  client_auto_join = true
}

tls {
  http = true
  rpc  = true

  ca_file   = "/etc/nomad/tls/nomad-agent-ca.pem"
  cert_file = "/etc/nomad/tls/global-server-nomad.pem"
  key_file  = "/etc/nomad/tls/global-server-nomad-key.pem"

  verify_server_hostname = true
  verify_https_client    = true
}

And my Nomad client configuration.

data_dir = "/opt/nomad"
log_level = "DEBUG"

consul {
  grpc_ca_file = "/etc/consul.d/tls/consul-agent-ca.pem"
  grpc_address = "127.0.0.1:8503"
  ca_file      = "/etc/consul.d/tls/consul-agent-ca.pem"
  ssl          = true
  verify_ssl   = false

  token   = "" # redacted
  address = "127.0.0.1:8501"

  auto_advertise   = true
  server_auto_join = true
  client_auto_join = true
}

tls {
  http = true
  rpc  = true

  ca_file   = "/etc/nomad/tls/nomad-agent-ca.pem"
  cert_file = "/etc/nomad/tls/global-client-nomad.pem"
  key_file  = "/etc/nomad/tls/global-client-nomad-key.pem"

  verify_server_hostname = true
  verify_https_client    = false
}

client {
  enabled = true
  servers = ["10.0.0.3", "10.0.0.2", "10.0.0.5"]
  meta {
    "raft_id"     = ""
    "droplet_id"  = "392288170"
    "floating_ip" = ""
    "role"        = "api"
  }
}

Googling the error hasn’t turned up too much.

Thanks for any help you can offer!

I forgot to mention that nothing related seems to print to the Nomad server logs (on any of them), which seems odd, if there is a 500 error happening.