Hello guys,
I’m facing an issue with sidecar proxy in a cluster with TLS enabled. In a situation, where I try to deploy a service, which should be connected via terminating gateway to a service, which is outside the service mesh. I have registered an external service, then I have deployed a job with a terminating gateway service and with my service which i want to deploy with a sidecar proxy.
Job.hcl
datacenters = ["dc1"]
type = "service"
group "gateway" {
network {
mode = "bridge"
}
service {
name = "sso-gateway"
connect {
gateway {
proxy {}
}
terminating {
service {
name = "sso"
}
}
}
sidecar_task {
config {
image = "xxxxxxxxxxx/library/envoy"
}
}
}
}
}
group "testaccount1" {
count = 1
network {
mode = "bridge"
port "http" {
to = 8080
static = 8080
}
}
service {
name = "testaccount1"
port = "http"
provider = "consul"
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "sso"
local_bind_port = 443
}
}
}
sidecar_task {
config {
image = "xxxxxxxxxx/library/envoy"
}
}
}
}
task "testaccount1" {
driver = "docker"
env {
}
config {
image = "xxxxxxxxx/account"
ports = ["http"]
auth {
username = xxxxx
password = xxxxx
}
}
}
}
}
This snippet is able to deploy terminating gateway and my specific service with its sidecar proxy. Consul’s health check on that sidecar proxy is giving me an error dial tcp 10.4.5.26:25299: connect: connection refused
. In an envoy sidecar logs i can see this
envoy logs
[2023-10-03 13:32:50.415][1][info][admin] [source/server/admin/admin.cc:66] admin address: 127.0.0.2:19001
[2023-10-03 13:32:50.416][1][info][config] [source/server/configuration_impl.cc:131] loading tracing configuration
[2023-10-03 13:32:50.416][1][info][config] [source/server/configuration_impl.cc:91] loading 0 static secret(s)
[2023-10-03 13:32:50.416][1][info][config] [source/server/configuration_impl.cc:97] loading 1 cluster(s)
[2023-10-03 13:32:50.467][1][info][config] [source/server/configuration_impl.cc:101] loading 0 listener(s)
[2023-10-03 13:32:50.467][1][info][config] [source/server/configuration_impl.cc:113] loading stats configuration
[2023-10-03 13:32:50.468][1][info][runtime] [source/common/runtime/runtime_impl.cc:463] RTDS has finished initialization
[2023-10-03 13:32:50.468][1][info][upstream] [source/common/upstream/cluster_manager_impl.cc:221] cm init: initializing cds
[2023-10-03 13:32:50.468][1][warning][main] [source/server/server.cc:802] there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
[2023-10-03 13:32:50.469][1][info][main] [source/server/server.cc:923] starting main dispatch loop
[2023-10-03 13:33:29.302][1][warning][config] [./source/common/config/grpc_stream.h:191] DeltaAggregatedResources gRPC config stream to local_agent closed since 38s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: immediate connect error: No such file or directory
[2023-10-03 13:33:45.667][1][warning][config] [./source/common/config/grpc_stream.h:191] DeltaAggregatedResources gRPC config stream to local_agent closed since 55s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: immediate connect error: No such file or directory
[2023-10-03 13:34:08.535][1][warning][config] [./source/common/config/grpc_stream.h:191] DeltaAggregatedResources gRPC config stream to local_agent closed since 78s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: immediate connect error: No such file or directory
[2023-10-03 13:34:16.799][1][warning][config] [./source/common/config/grpc_stream.h:191] DeltaAggregatedResources gRPC config stream to local_agent closed since 86s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: immediate connect error: No such file or directory
[2023-10-03 13:34:17.366][1][warning][config] [./source/common/config/grpc_stream.h:191] DeltaAggregatedResources gRPC config stream to local_agent closed since 86s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: immediate connect error: No such file or directory
With those last messages in log above I started thinking that grpc is not working as it should. I have a TLS enabled in nomad and same with consul.
nomad server config
datacenter = "dc1"
data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0"
server {
enabled = true
bootstrap_expect = 3
encrypt = "xxxxxxxxxx"
}
tls {
http = true
rpc = true
ca_file = "/etc/pki/nomad/nomad-agent-ca.pem"
cert_file = "/etc/pki/nomad/global-server-nomad.pem"
key_file = "/etc/pki/nomad/global-server-nomad-key.pem"
verify_server_hostname = true
verify_https_client = true
}
client {
enabled = false
}
consul {
address = "127.0.0.1:8501"
token = "xxxxxxxxxxxxx"
grpc_ca_file = "/etc/pki/consul/consul-agent-ca.pem"
grpc_address = "127.0.0.1:8503"
ca_file = "/etc/pki/consul/consul-agent-ca.pem"
cert_file = "/etc/pki/consul/dc1-server-consul-1.pem"
key_file = "/etc/pki/consul/dc1-server-consul-1-key.pem"
ssl = true
}
acl {
enabled = true
}
consul server config
data_dir = "/opt/consul"
node_name = "server2"
client_addr = "0.0.0.0"
bind_addr = "10.4.5.22"
advertise_addr = "10.4.5.22"
encrypt = "xxxxxxxxxxxxxxxxx"
encrypt_verify_incoming = true
encrypt_verify_outgoing = true
ui_config {
enabled = true
}
rejoin_after_leave = true
verify_incoming = true
verify_outgoing = true
verify_server_hostname = true
ca_file = "/etc/pki/consul/consul-agent-ca.pem"
cert_file = "/etc/pki/consul/dc1-server-consul-1.pem"
key_file = "/etc/pki/consul/dc1-server-consul-1-key.pem"
ports = {
https = 8501
http = 8500
grpc = 8502
grpc_tls = 8503
dns = -1
}
acl {
enabled = true
default_policy = "deny"
tokens {
default = "xxxxxxxxxxxxx"
}
}
server = true
bootstrap_expect = 3
log_level = "DEBUG"
log_file = "/var/log/consul/"
log_rotate_max_files = 30
used versions
Nomad v1.6.2
BuildDate 2023-09-13T16:47:25Z
Revision 73e372ad94033db2ceaf53468b270a31544c23fd
Consul v1.16.2
Revision 68f81912
Build Date 2023-09-19T19:29:18Z
I’m not sure what could be wrong in my case.
Best Regards