This server has too many xDS streams open, please try another

Hi,

I get the following error, when launching multiple Consul Connect enabled jobs:

2022-12-10T12:43:14.877+0100 [ERROR] agent.envoy: Error handling ADS delta stream: xdsVersion=v3 error="rpc error: code = ResourceExhausted desc = this server has too many xDS streams open, please try another"

I looked at the consul code and understood, that this error happens, when too many proxies connect to a single consul server via gRPC. Ideally, the envoy proxies should distribute their requests among consul servers. But Envoy proxies started by nomad communicate via a Unix Socket with the nomad client, which forwards the gRPC requests to a consul server. But in my case it is always the local consul server.
So my question is: Why doesn’t the nomad client distribute the requests among the consul server and only forwards the gRPC request du the local consul server?
Or did I understand the code incorrectly and this error originates from somewhere else?

My Setup

Machine A: Consul Server/Client, Nomad Server/Client
Machine B: Consul Server/Client, Nomad Server/Client
Machine C: Consul Client, Nomad Server/Client
Machine D: Consul Client, Nomad Client

This error happens on Machine B and it also happens with a three node consul cluster.

Consul Config A/B:

server = true
bootstrap_expect = 2
ui_config {
  enabled = true
}

client_addr = "127.0.0.1 10.9.0.2"
bind_addr = "10.9.0.2"
advertise_addr = "10.9.0.2"

datacenter = "dc1"
data_dir = "/opt/consul"
encrypt = "....."
tls {
  defaults {
    ca_file = "/etc/consul.d/consul-agent-ca.pem"
    cert_file = "/etc/consul.d/dc1-server-consul.pem"
    key_file = "/etc/consul.d/dc1-server-consul-key.pem"
    verify_incoming = true
    verify_outgoing = true
  }
  internal_rpc {
    verify_server_hostname = true
  }
}
auto_encrypt {
  allow_tls = true
}

acl = {
  enabled = true
  default_policy = "deny"
  enable_token_persistence = true
  tokens {
    agent = "...."
    default = "...."
  }
}

performance {
  raft_multiplier = 1
}

ports {
  https = 8501
  grpc = 8502
  grpc_tls = 8503
}

connect {
  enabled = true
}

retry_join = ["10.9.0.1", "10.9.0.2"]

Nomad Config A/B/C:

data_dir = "/opt/nomad/data"

advertise {
  http = "10.9.0.2"
  rpc  = "10.9.0.2"
  serf = "10.9.0.2"
}

server {
  # license_path is required as of Nomad v1.1.1+
  #license_path = "/etc/nomad.d/nomad.hcl"
  enabled = true
  bootstrap_expect = 3
}

client {
  enabled = true
  network_interface = "wg0"

  min_dynamic_port = 26000
  max_dynamic_port = 32000
}

consul {
  address   = "127.0.0.1:8501"
  grpc_address = "127.0.0.1:8502"
  token     = "...."
  ssl       = true
  ca_file   = "/etc/nomad.d/consul-agent-ca.pem"
  cert_file = "/etc/nomad.d/dc1-server-consul.pem"
  key_file  = "/etc/nomad.d/dc1-server-consul-key.pem"
  auto_advertise      = true
  server_auto_join    = true
  client_auto_join    = true
}

acl {
  enabled = true
}

telemetry {
  prometheus_metrics         = true
  publish_allocation_metrics = true
  publish_node_metrics       = true
}

vault {
  enabled = true
  address = "http://vault.service.consul:8200"
  task_token_ttl = "1h"
  create_from_role = "nomad-cluster"
  token = "...."
  allow_unauthenticated = false
}
1 Like

Hi @Gabscap,

I wonder if the problem you are seeing is related to Consul issue #15753?

Thanks,
jrasell and the Nomad team