I run hashicorp vault in a kubernetes cluster and have some other clusters that get secrets using vault-csi.
I am using the kubernetes authentication method and have not had any problems with it.
Now, another kubernetes cluster is added and it is located in another network with a firewall between the existing network.
And now to my problem. Although the firewall allows the connection, the new cluster is not able to authenticate and login. Already during the login process I get a 403 error from Vault. The connection between the new cluster in the new network and the vault cluster in the old network works. i also see the login attempts in the logs, but i don’t know why it doesn’t work. I test connected the networks briefly so that the connection does not go through the firewall, then it worked directly - so the configuration is correct.
The only noticeable thing is the duration in the Vault log when it doesn’t work:
I get a permission denied when I tried to login
If I authenticate with another method it works. The source IP for example also stays the same, this does not change. Even if nothing is restricted in terms of IP addresses.
Can anyone give me tips on how to troubleshoot this more accurately?
We tried many things out, did packet captures and analyzed it. We can´t see that any data packets get lost or something like that. In our point of view, the firewall and azure network works fine.
I looked at the vault audit logs and can only see the request and the response. The only difference between if it works and if it does not work that the response does not include the “auth” block when the connection does not work + that a duration timeout of 30s will be reached.
It looks like that Vault is waiting for something, but don´t get it.
If I try to authenticate against a path that does not exist, I get immediately the permission denied error. When I use the right path, then it takes 30s before I get the permission denied error.
Hi @Tim-herbie sounds like you have a reasonably complex environment (multiple aks clusters/networks).
Have you confirmed you can access Vault from the cluster it is running on?
yes. The Vault-server cluster itself as well as all other kubernetes clusters in the same vnet are able to access it.
My familiarity with Vault and Kubernetes is relative to our HCP Vault offering, apologies if this suggestion is not relevant in your case. Is it possible the incorrect cert was added to the auth method config?
For example, this is how I might set it up with HCP Vault
Thank you for your answer, but we checked the cert more than one time.
If it would be wrong, the connections wouldn´t even work when we change the network routing. That is the confusing point. Everytime when we peer the two virtual networks, it works fine. But that is not a option, because of network security reasons.
I think I understand now - so when you peer the VNets, the k8s auth method works, when you break that peer and set up some other routing, that is when you get the 403?
When I break the peering, azure change the routing itself to route anything over the configured hop (our fortigate firewall).
Do you have to manage routes in both directions in your setup? Said another way, does the VNet/cluster Vault is running on have a route back to your app (sounds like your app is getting to Vault based on the error)?
I am not familiar with Fortigate but sounds like you at least have Vault set up properly if authentication is working when the VNets are peered.
I was now able to solve the problem:
The issue was that I granted only the access from vault-client → vault-server in the firewall.
But it seems that vault creates an additional and dedicated connection from vault-server → vault-client and that wasn´t configured in the firewall.
Great to hear @Tim-herbie - certainly sounded like a port or route not being opened.