Hi,
I’m trying to setup consul clients on my AKS cluster (consul server is outside the K8S), but deployment is failing with the following error on consul-server-acl-init job pod:
Failure: calling /agent/self to get datacenter: err="Get "https://10.162.34.220:8501/v1/agent/self": remote error: tls: bad certificate"
I double checked the ca certificate provided as K8S secret and it looks OK. It’s correctly mapped to pod and contains correct data. Same certificate is used by other consul client deployed outside the kubernetes and everything is OK.
Is your server configured to only allow mTLS connections, i.e. does it have verify_incoming set to true? Generally, if you have ACLs enabled, mTLS doesn’t provide any extra layers of security, so I’d recommend turning it off and just keeping “regular”, one-way TLS plus ACLs turned on.
Hi,
Thanks for reply. Yes my consul server is configured with verify_incoming = true. And normally it should work with such config or at least I think so . After setting manageSystemACLs: false my K8S clients joined the cluster however I still can’t enable connectInject because of same reason. Init container named get-auto-encrypt-client-ca of consul-connect-injector-webhook pod is not able to get the client CA and failing with same error:
[ERROR] Error retrieving CA roots from Consul: err="Get "https://consul server:8501/v1/agent/connect/ca/roots": remote error: tls: bad certificate"
Inspecting the init container shows that the command is running and produces mentioned error message:
I pretty much sure that if I will set verify_incoming = false on my consul server this will solve the issue however according the documentation it’s recommended to keep it enabled.
It looks like you are getting the same TLS error: bad certificate.
When you have verify_incoming set on the server, it will expect all clients, RPC or HTTP, to present a client certificate and will reject any connections that don’t do so.
In this case, our helm chart does not support servers that require HTTP clients to present client certs. So when some components are talking to the server over HTTPS, they don’t present a certificate, and you’re seeing the bad certificate error in your logs.
To fix your problem, I recommend turning ACLs in the helm chart back on and setting verify_incoming_https on the server to false, but verify_incoming_rpc to true. That way you can still ensure only trusted Consul clients are joining the cluster, but for HTTPS connections you don’t need that since you’re using ACLs already.
Hi @ishustava ,
Thanks for the answer.
According the helm chart reference in consul documentation manageSystemACLs: false should only be enabled if consul servers running inside the kubernetes.
In my case consul servers are outside the Kubernetes installed on a separate VM’s.
Also as far as I understand command consul-k8s get-consul-client-ca -output-file=/consul/tls/client/ca/tls.crt -server-addr=consulserver -server-port=8501 -ca-file=/consul/tls/ca/tls.crt executed in init container exactly tries to retrieve the client certificate which is expected in case of auto_encrypt mode.
Just want to clarify one more time that the problem here is not with consul client pods because they are up, running and joined to the cluster. The problem is with consul-connect-injector-webhook pod:
Concluding your answer it’s sounds that it’s not possible to appropriately join K8S consul clients to outside consul cluster and have verify_incoming enabled on consul servers together with
Apologies @andriktr, these docs were out of date. It has now been fixed. I think you already saw that we have more docs on how to enable ACLs with external servers here.
Also as far as I understand command consul-k8s get-consul-client-ca -output-file=/consul/tls/client/ca/tls.crt -server-addr=consulserver -server-port=8501 -ca-file=/consul/tls/ca/tls.crt executed in init container exactly tries to retrieve the client certificate which is expected in case of auto_encrypt mode.
That’s not quite true. It’s retrieving the CA that has signed the Consul client’s certificate.
Concluding your answer it’s sounds that it’s not possible to appropriately join K8S consul clients to outside consul cluster and have verify_incoming enabled on consul servers together with
You are correct. I’ll add it to our docs, thanks for pointing this out!
That’s not quite true. It’s retrieving the CA that has signed the Consul client’s certificate.
But the CA’s certificate is already provided in the command in the argument-ca-file=/consul/tls/ca/tls.crt
This certificate exist in the container in the mentioned path /consul/tls/ca/tls.crt and it comes from Kubernetes secret referenced in helm chart tls configuration:
So there’s two CA’s in play. There’s the CA for the servers and then when auto-encrypt is enabled, there’s a separate CA for the consul clients.
What we need to do is talk to the server to get the separate client CA. To do that we use the get-consul-client-ca command. However to make the call to the server, we need the server CA. Thats what’s being passed in by -ca-file=/consul/tls/ca/tls.crt. Then for subsequent calls to the Consul clients we need to use the client CA.
Hey @ishustava@lkysow maybe you will also have some insights on my another issue with setting up consul client on K8S. The problem is that if I set acl default policy to deny K8S consul client can’t join the cluster. Client container logs show the following:
==> Starting Consul agent...
Version: '1.8.3+ent'
Node ID: '5a64aa6b-178f-f732-d862-fe8482e0fbe6'
Node name: 'redacted'
Datacenter: 'consul-azure-dc' (Segment: '')
Server: false (Bootstrap: false)
Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: 8501, gRPC: 8502, DNS: 8600)
Cluster Addr: redacted (LAN: 8301, WAN: 8302)
Encrypt: Gossip: true, TLS-Outgoing: true, TLS-Incoming: false, Auto-Encrypt-TLS: true
==> Log data will now stream in as it occurs:
2020-08-17T07:44:47.304Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: redacted redacted
2020-08-17T07:44:47.304Z [WARN] agent.client.manager: No servers available
2020-08-17T07:44:47.514Z [WARN] agent.client: AutoEncrypt failed: error="rpcinsecure error making call: rpcinsecure error making call: ACL not found"
2020-08-17T07:44:47.522Z [WARN] agent.client: AutoEncrypt failed: error="rpcinsecure error making call: rpcinsecure error making call: ACL not found"
2020-08-17T07:44:47.595Z [WARN] agent.client: AutoEncrypt failed: error="rpcinsecure error making call: ACL not found"
2020-08-17T07:44:47.595Z [WARN] agent.client: retrying AutoEncrypt: retry_interval=31.706936702s