Hi,
I get the following error, when launching multiple Consul Connect enabled jobs:
2022-12-10T12:43:14.877+0100 [ERROR] agent.envoy: Error handling ADS delta stream: xdsVersion=v3 error="rpc error: code = ResourceExhausted desc = this server has too many xDS streams open, please try another"
I looked at the consul code and understood, that this error happens, when too many proxies connect to a single consul server via gRPC. Ideally, the envoy proxies should distribute their requests among consul servers. But Envoy proxies started by nomad communicate via a Unix Socket with the nomad client, which forwards the gRPC requests to a consul server. But in my case it is always the local consul server.
So my question is: Why doesn’t the nomad client distribute the requests among the consul server and only forwards the gRPC request du the local consul server?
Or did I understand the code incorrectly and this error originates from somewhere else?
My Setup
Machine A: Consul Server/Client, Nomad Server/Client
Machine B: Consul Server/Client, Nomad Server/Client
Machine C: Consul Client, Nomad Server/Client
Machine D: Consul Client, Nomad Client
This error happens on Machine B and it also happens with a three node consul cluster.
Consul Config A/B:
server = true
bootstrap_expect = 2
ui_config {
enabled = true
}
client_addr = "127.0.0.1 10.9.0.2"
bind_addr = "10.9.0.2"
advertise_addr = "10.9.0.2"
datacenter = "dc1"
data_dir = "/opt/consul"
encrypt = "....."
tls {
defaults {
ca_file = "/etc/consul.d/consul-agent-ca.pem"
cert_file = "/etc/consul.d/dc1-server-consul.pem"
key_file = "/etc/consul.d/dc1-server-consul-key.pem"
verify_incoming = true
verify_outgoing = true
}
internal_rpc {
verify_server_hostname = true
}
}
auto_encrypt {
allow_tls = true
}
acl = {
enabled = true
default_policy = "deny"
enable_token_persistence = true
tokens {
agent = "...."
default = "...."
}
}
performance {
raft_multiplier = 1
}
ports {
https = 8501
grpc = 8502
grpc_tls = 8503
}
connect {
enabled = true
}
retry_join = ["10.9.0.1", "10.9.0.2"]
Nomad Config A/B/C:
data_dir = "/opt/nomad/data"
advertise {
http = "10.9.0.2"
rpc = "10.9.0.2"
serf = "10.9.0.2"
}
server {
# license_path is required as of Nomad v1.1.1+
#license_path = "/etc/nomad.d/nomad.hcl"
enabled = true
bootstrap_expect = 3
}
client {
enabled = true
network_interface = "wg0"
min_dynamic_port = 26000
max_dynamic_port = 32000
}
consul {
address = "127.0.0.1:8501"
grpc_address = "127.0.0.1:8502"
token = "...."
ssl = true
ca_file = "/etc/nomad.d/consul-agent-ca.pem"
cert_file = "/etc/nomad.d/dc1-server-consul.pem"
key_file = "/etc/nomad.d/dc1-server-consul-key.pem"
auto_advertise = true
server_auto_join = true
client_auto_join = true
}
acl {
enabled = true
}
telemetry {
prometheus_metrics = true
publish_allocation_metrics = true
publish_node_metrics = true
}
vault {
enabled = true
address = "http://vault.service.consul:8200"
task_token_ttl = "1h"
create_from_role = "nomad-cluster"
token = "...."
allow_unauthenticated = false
}