Hello,
We’re having a strange issue with Consul’s service mesh: the wrong virtual IP address is being added to the Envoy configuration (consul-dataplane), preventing communication between services.
Kubernetes version 1.28 (EKS), Consul version 1.17.1, Consul dataplane 1.3.1 and we are using Karpenter for node auto-scaling.
When looking at the service sidecar configuration in Consul, the IP address is correct: 240.0.0.25.
curl 127.0.0.1:8500/v1/catalog/service/broken-service-sidecar-proxy|jq
[
{
"ID": "",
"Node": "ip-10-16-xx-xxx.eu-central-1.compute.internal-virtual",
"Address": "10.16.xx.xxx",
"Datacenter": "datacenter",
"TaggedAddresses": null,
"NodeMeta": {
"synthetic-node": "true"
},
"ServiceKind": "connect-proxy",
"ServiceID": "broken-service-6958df5d68-5hpsd-broken-service-sidecar-proxy",
"ServiceName": "broken-service-sidecar-proxy",
"ServiceTags": [],
"ServiceAddress": "10.30.29.117",
"ServiceTaggedAddresses": {
"consul-virtual": {
"Address": "240.0.0.25",
"Port": 20000
},
"virtual": {
"Address": "10.97.38.160",
"Port": 9000
}
},
"ServiceWeights": {
"Passing": 1,
"Warning": 1
},
"ServiceMeta": {
"k8s-namespace": "broken-service",
"k8s-service-name": "broken-service",
"managed-by": "consul-k8s-endpoints-controller",
"pod-name": "broken-service-6958df5d68-5hpsd",
"pod-uid": "9eeb2ba6-63d6-4957-8059-57cfee6266a9",
"synthetic-node": "true"
},
"ServicePort": 20000,
"ServiceSocketPath": "",
"ServiceEnableTagOverride": false,
"ServiceProxy": {
"DestinationServiceName": "broken-service",
"DestinationServiceID": "broken-service-6958df5d68-5hpsd-broken-service",
"LocalServiceAddress": "127.0.0.1",
"LocalServicePort": 9000,
"Mode": "transparent",
"Config": {
"envoy_prometheus_bind_addr": "0.0.0.0:20200"
},
"MeshGateway": {},
"Expose": {}
},
"ServiceConnect": {},
"ServiceLocality": {
"Region": "eu-central-1",
"Zone": "eu-central-1b"
},
"CreateIndex": 5778295,
"ModifyIndex": 5778295
},
{
"ID": "",
"Node": "ip-10-16-x-xxx.eu-central-1.compute.internal-virtual",
"Address": "10.16.x.xxx",
"Datacenter": "datacenter",
"TaggedAddresses": null,
"NodeMeta": {
"synthetic-node": "true"
},
"ServiceKind": "connect-proxy",
"ServiceID": "broken-service-6958df5d68-6tsjc-broken-service-sidecar-proxy",
"ServiceName": "broken-service-sidecar-proxy",
"ServiceTags": [],
"ServiceAddress": "10.30.10.117",
"ServiceTaggedAddresses": {
"consul-virtual": {
"Address": "240.0.0.25",
"Port": 20000
},
"virtual": {
"Address": "10.97.38.160",
"Port": 9000
}
},
"ServiceWeights": {
"Passing": 1,
"Warning": 1
},
"ServiceMeta": {
"k8s-namespace": "broken-service",
"k8s-service-name": "broken-service",
"managed-by": "consul-k8s-endpoints-controller",
"pod-name": "broken-service-6958df5d68-6tsjc",
"pod-uid": "af61d800-f2b5-4bdf-91b3-3a7930a57f40",
"synthetic-node": "true"
},
"ServicePort": 20000,
"ServiceSocketPath": "",
"ServiceEnableTagOverride": false,
"ServiceProxy": {
"DestinationServiceName": "broken-service",
"DestinationServiceID": "broken-service-6958df5d68-6tsjc-broken-service",
"LocalServiceAddress": "127.0.0.1",
"LocalServicePort": 9000,
"Mode": "transparent",
"Config": {
"envoy_prometheus_bind_addr": "0.0.0.0:20200"
},
"MeshGateway": {},
"Expose": {}
},
"ServiceConnect": {},
"ServiceLocality": {
"Region": "eu-central-1",
"Zone": "eu-central-1a"
},
"CreateIndex": 5778298,
"ModifyIndex": 5778298
},
{
"ID": "",
"Node": "ip-10-16-x-xxx.eu-central-1.compute.internal-virtual",
"Address": "10.16.x.xxx",
"Datacenter": "datacenter",
"TaggedAddresses": null,
"NodeMeta": {
"synthetic-node": "true"
},
"ServiceKind": "connect-proxy",
"ServiceID": "broken-service-6958df5d68-m78p6-broken-service-sidecar-proxy",
"ServiceName": "broken-service-sidecar-proxy",
"ServiceTags": [],
"ServiceAddress": "10.30.10.116",
"ServiceTaggedAddresses": {
"consul-virtual": {
"Address": "240.0.0.25",
"Port": 20000
},
"virtual": {
"Address": "10.97.38.160",
"Port": 9000
}
},
"ServiceWeights": {
"Passing": 1,
"Warning": 1
},
"ServiceMeta": {
"k8s-namespace": "broken-service",
"k8s-service-name": "broken-service",
"managed-by": "consul-k8s-endpoints-controller",
"pod-name": "broken-service-6958df5d68-m78p6",
"pod-uid": "cd7b71b7-066a-40a9-b349-3c82568a6224",
"synthetic-node": "true"
},
"ServicePort": 20000,
"ServiceSocketPath": "",
"ServiceEnableTagOverride": false,
"ServiceProxy": {
"DestinationServiceName": "broken-service",
"DestinationServiceID": "broken-service-6958df5d68-m78p6-broken-service",
"LocalServiceAddress": "127.0.0.1",
"LocalServicePort": 9000,
"Mode": "transparent",
"Config": {
"envoy_prometheus_bind_addr": "0.0.0.0:20200"
},
"MeshGateway": {},
"Expose": {}
},
"ServiceConnect": {},
"ServiceLocality": {
"Region": "eu-central-1",
"Zone": "eu-central-1a"
},
"CreateIndex": 5778305,
"ModifyIndex": 5778305
}
]
Service intentions are configured using wildcard:
--
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
name: broken-service
namespace: broken-service
spec:
protocol: grpc
transparentProxy:
dialedDirectly: true
---
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceIntentions
metadata:
name: broken-service
namespace: broken-service
spec:
destination:
name: broken-service
sources:
- name: "*"
action: allow
DNS also resolves correctly:
# nslookup broken-service.virtual.consul
Server: 127.0.0.53
Address: 127.0.0.53#53
Non-authoritative answer:
Name: broken-service.virtual.consul
Address: 240.0.0.25
But if we then look at the service B consul dataplane envoy configuration in some pods, we see that the correct configuration is applied.:
"filter_chain_match": {
"prefix_ranges": [
{
"address_prefix": "10.97.38.160",
"prefix_len": 32
},
{
"address_prefix": "240.0.0.25",
"prefix_len": 32
}
]
},
And in some pods not:
"filter_chain_match": {
"prefix_ranges": [
{
"address_prefix": "10.97.38.160",
"prefix_len": 32
},
{
"address_prefix": "240.0.0.23",
"prefix_len": 32
}
]
},
kubectl -n broken-service get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
broken-service ClusterIP 10.97.38.160 <none> 9000/TCP 65m
Tried to remove and add the app back but the problem still exists. Don’t know where to look future, any advice is greatly appreciated.