We have Consul servers cluster already running in AWS in EC2. Now, we are extending some services to run in Azure AKS. To secure service-to-service connection, we are about to deploy Consul client in AKS. As AKS endpoint is private, I am wondering if Consul server would be able to connect to AKS API (In Azure, private endpoint could be reached only from vnet where AKS are). I read somewhere that
k8sAuthMethodHost should be set to the address of your Kubernetes API server so that the Consul servers can validate a Kubernetes service account token when using the Kubernetes auth method with consul login
Is private AKS endpoint obstacle in AKS services integration with Consul cluster in AWS?
If it is, what would be solution for me then, to integrate AKS services with Consul server. Should I deploy another Consul cluster in AKS, and connect it somehow with existing in AWS?
What I realized just now, consul client running in kubernetes, in order to join Consul cluster running outside of kubernetes, has to be on sam LAN with Consul cluster. That’s another reason why consul client running in AKS can not join consul cluster in AWS side. That’s what docs says at least.
Thank you! I have 2 Consul clusters - existing one running in VM and another - new one - created in kubernetes (AKS), and facing some challenges to join those clusters (to do federation using mesh gateway). Can you please help me, what is common place to run mesh-gateway in VM Consul cluster? Should I create separate VM and install consul agent there, or I can host mesh-gateway in Consul server for example? I am wondering what port to use to start mesh-gateway on. My VM cluster is TLS enabled.
I have just realized that might not need mesh-gateway at all. We have primary Consul cluster running in AWS on VMs, and secondary in kubernetes (AKS) Azure. As we do have VPN between AWS and Azure, it should be possible to achieve federation between Consul datacenters using a single WAN gossip pool. That would mean no need for adding additional complexity with mesh-gateway.
Note that k8s federation through the Helm chart is really only supported via mesh gateways. Basically most of the federation features on k8s assume you’re using mesh gateways so if you don’t use mesh gateways you may run into some edge cases.
Hi @lkysow I realized that! I have now running one Consul cluster in VMs and another one in AKS cluster, connected through mesh gateway. Clusters joined and was working as expected. But now I am facing issue when enable ACL in helm chart. ACL is also enabled in VM cluster which is my primary Consul cluster. So, if I set
acls:
manageSystemACLs: false
It’s working as expected. But if i change manageSystemACLs to true, it is failing to start pod. More precisely it is failing consul-connect-inject-init, this way.
kubectl logs pods/fleet-mgr --container=consul-connect-inject-init
2021-11-05T15:45:39.955Z [ERROR] Consul login failed; retrying: error="error logging in: Unexpected response code: 403 (rpc error making call: rpc error making call: Permission denied)"
HI @lkysow Are you sure that I should follow instructions from the link above? I would say that it is not my use case, as referring to scenario when only Consul clients running in kubernetes, and needs to join Consul server cluster running in VM. My use case is different. Consul cluster running in kubernetes also - not only clients. And I have mesh gateway in both Consul clusters, and those clusters joined successfully. Kubernetes and VM Consul clusters in my use case are not on same LAN. VM cluster is running in AWS and AKS cluster is in Azure. But I have VPN connectivity in between and all Consul server/agents can reach each other.
When I have tried to adjust my helm config with instruction from your link above, I got following error when try to install helm chart with such config:
helm install azure hashicorp/consul -f config.yaml --wait
Error: INSTALLATION FAILED: execution error at (consul/templates/server-acl-init-job.yaml:2:65): only one of server.enabled or externalServers.enabled can be set
And I can understand this because I was trying to apply config which is not my use case.
This is helm config I run above helm chart install with:
I have also tried to set diff serviceAccount for pod like you proposed here,
and my service pod failed to start again. But I was able to see in the log that consul login was successful. It failed then with this error:
kubectl logs pods/fleet-mgr --container=consul-connect-inject-init
2021-11-05T16:48:51.169Z [INFO] Consul login complete
2021-11-05T16:48:51.171Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
Actually this is working now. It managed to connect in the end. Seemed I just had to wait for a bit longer:
root@west-vm:/home/azureuser# kubectl logs pods/fleet-mgr --container=consul-connect-inject-init
2021-11-08T11:41:06.161Z [INFO] Consul login complete
2021-11-08T11:41:06.162Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:07.163Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:08.164Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:09.166Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:10.168Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:11.170Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:12.172Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:13.174Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:14.175Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:15.176Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:16.177Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:17.178Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:18.179Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:19.181Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:20.182Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:21.183Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:22.185Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:23.186Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:24.187Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:25.188Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:26.189Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:27.190Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:28.191Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:29.193Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:30.194Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:31.196Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:32.198Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:33.200Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:34.204Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:35.205Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)"
2021-11-08T11:41:36.208Z [INFO] Registered service has been detected: service=fleet-mgr-sidecar-proxy
2021-11-08T11:41:36.208Z [INFO] Registered service has been detected: service=fleet-mgr
2021-11-08T11:41:36.208Z [INFO] Connect initialization completed
Successfully applied traffic redirection rules
Hi @lkysow
After successfully starting service pod with ACL enabled in kubernetes side I noticed that it is now failing to access kubernetes services from VM cluster UI. I am getting error 500 like in the picture when I try to switch to kubernetes datacenter where ACL is enabled. In the same time I can access kubernetes services in datacenter where ACL is disabled. I am logged in with bootstrap token in VM Consul GUI. Is there a way to have preview of services from datacenters where ACL is enabled from VM Consul UI? How I can come to the token which would have right to access another datacenter where ACL is enabled?
The 500 is unlikely due to ACLs. What is the URL in that screenshot? Usually a 500 is because the datacenter where the UI is hosted can’t talk to the other datacenter you’re selecting. The steps to debug that would be to look at the server logs in the datacenter where the UI is hosted.
Yes, you are right. Not sure if it is only case with my env. but I have to restart consul service in Consul server VM where mesh gateway is hosted, after deploying new secondary datacenter (VM is my primary datacenter). Only then I do have connectivity with kubernetes cluster and then I also can see in the GUI all datacenters. Is that expected behavior?
Thank you a lot for your advices!
I attached some logs I captured around time when consul service restart happened in server where mesh-gw is hosted. Those are IPs of servers VM in primary Consul cluster:
The error that I see quite often in VM in the log is:
[ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.253.160.54:8302: write tcp 10.242.89.235:53140->10.242.89.235:19005: write: broken pipe
where
10.242.89.235:19005
is mesh gateway address.
In fact, after deploying new (or recreating existing) secondary Consul cluster in kubernetes (AKS), those are the logs I see in Consul VM, and systemd service status for consul:
2021-11-09T11:09:52.359128+00:00 frame-consul10-242-89-235 consul[7764]: 2021-11-09T11:09:52.358Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.253.168.27:8300 datacenter=clusterwest method=Internal.ServiceDump error="rpc error getting client: failed to get conn: read tcp 10.242.89.235:36928->10.242.89.235:19005: read: connection reset by peer"
2021-11-09T11:09:55.243253+00:00 frame-consul10-242-89-235 consul[7764]: 2021-11-09T11:09:55.242Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.253.168.70:8300 datacenter=clusterwest method=Health.ServiceNodes error="rpc error getting client: failed to get conn: read tcp 10.242.89.235:36940->10.242.89.235:19005: read: connection reset by peer"
2021-11-09T11:09:58.937021+00:00 frame-consul10-242-89-235 consul[7764]: 2021-11-09T11:09:58.936Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.253.168.27:8302: read tcp 10.242.89.235:36966->10.242.89.235:19005: read: connection reset by peer
2021-11-09T11:09:59.236650+00:00 frame-consul10-242-89-235 consul[7764]: 2021-11-09T11:09:59.236Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send ping: read tcp 10.242.89.235:36970->10.242.89.235:19005: read: connection reset by peer
2021-11-09T11:09:59.954264+00:00 frame-consul10-242-89-235 consul[7764]: 2021-11-09T11:09:59.953Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: azure-server-1.clusterwest 10.253.168.66
2021-11-09T11:09:59.954500+00:00 frame-consul10-242-89-235 consul[7764]: 2021-11-09T11:09:59.954Z [INFO] agent.server: Handled event for server in area: event=member-join server=azure-server-1.clusterwest area=wan
2021-11-09T11:10:00.244459+00:00 frame-consul10-242-89-235 consul[7764]: 2021-11-09T11:10:00.244Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.253.168.66:8302: read tcp 10.242.89.235:36982->10.242.89.235:19005: read: connection reset by peer
systemctl status consul -ll
● consul.service - "HashiCorp Consul - A service mesh solution"
Loaded: loaded (/etc/systemd/system/consul.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2021-11-08 23:00:48 UTC; 12h ago
Docs: https://www.consul.io/
Main PID: 7764 (consul)
CGroup: /system.slice/consul.service
└─7764 /usr/bin/consul agent -config-dir=/etc/consul.d
Nov 09 11:15:04 frame-consul10-242-89-235 consul[7764]: 2021-11-09T11:15:04.939Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.253.168.27:8302: read tcp 10.242.89.235:38920->10.242.89.235:19005: read: connection reset by peer
Nov 09 11:15:05 frame-consul10-242-89-235 consul[7764]: 2021-11-09T11:15:05.747Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to forward ack: read tcp 10.242.89.235:38948->10.242.89.235:19005: read: connection reset by peer from=10.253.168.27:8302
Nov 09 11:15:05 frame-consul10-242-89-235 consul[7764]: 2021-11-09T11:15:05.747Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.253.168.70:8302: read tcp 10.242.89.235:38938->10.242.89.235:19005: read: connection reset by peer
Nov 09 11:15:08 frame-consul10-242-89-235 consul[7764]: 2021-11-09T11:15:08.682Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.253.168.70:8302: read tcp 10.242.89.235:38958->10.242.89.235:19005: read: connection reset by peer
Nov 09 11:15:09 frame-consul10-242-89-235 consul[7764]: 2021-11-09T11:15:09.940Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.253.168.66:8302: read tcp 10.242.89.235:38964->10.242.89.235:19005: read: connection reset by peer
Nov 09 11:15:09 frame-consul10-242-89-235 consul[7764]: 2021-11-09T11:15:09.943Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.253.168.27:8302: read tcp 10.242.89.235:38974->10.242.89.235:19005: read: connection reset by peer
Nov 09 11:15:10 frame-consul10-242-89-235 consul[7764]: 2021-11-09T11:15:10.369Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.253.168.70:8302: read tcp 10.242.89.235:38978->10.242.89.235:19005: read: connection reset by peer
Nov 09 11:15:10 frame-consul10-242-89-235 consul[7764]: 2021-11-09T11:15:10.437Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.253.168.27:8302: read tcp 10.242.89.235:38982->10.242.89.235:19005: read: connection reset by peer
Nov 09 11:15:11 frame-consul10-242-89-235 consul[7764]: 2021-11-09T11:15:11.735Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.253.168.70:8300 datacenter=clusterwest method=Health.ServiceNodes error="rpc error getting client: failed to get conn: read tcp 10.242.89.235:39000->10.242.89.235:19005: read: connection reset by peer"
Nov 09 11:15:11 frame-consul10-242-89-235 consul[7764]: 2021-11-09T11:15:11.736Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to forward ack: read tcp 10.242.89.235:38986->10.242.89.235:19005: read: connection reset by peer from=10.253.168.70:8302
And only restarting consul service in VM I can reach service running in newly deployed kubernetes Consul cluster.