Remote error: tls: bad certificate for K8S consul clients

Hi,
I’m trying to setup consul clients on my AKS cluster (consul server is outside the K8S), but deployment is failing with the following error on consul-server-acl-init job pod:

Failure: calling /agent/self to get datacenter: err="Get "https://10.162.34.220:8501/v1/agent/self": remote error: tls: bad certificate"

I double checked the ca certificate provided as K8S secret and it looks OK. It’s correctly mapped to pod and contains correct data. Same certificate is used by other consul client deployed outside the kubernetes and everything is OK.

Here is my consul helm config file:

global:
  enabled: false
  name: consul
  datacenter: consul-azure-dc
  acls:
    manageSystemACLs: true
    bootstrapToken:
      secretName: consul-acl-token  
      secretKey: bootstrap-token  
  gossipEncryption:
    secretName: consul-gossip-encryption
    secretKey: gossip
  tls:
    enabled: true
    enableAutoEncrypt: true
    caCert:
      secretName: consul-ca-cert
      secretKey: tls.crt
externalServers:
  enabled: true
  hosts: ["10.162.34.220","10.162.34.222","10.162.34.221"]
  k8sAuthMethodHost: https://my-aks-cluster.hcp.westeurope.azmk8s.io:443
  #useSystemRoots: true
client:
  enabled: true
  join: ["10.162.34.220","10.162.34.222","10.162.34.221"]
connectInject:
  enabled: false
  default: false 

Maybe I’m doing something wrong? Any suggestions.
Thanks in advance

Hey @andriktr,

Is your server configured to only allow mTLS connections, i.e. does it have verify_incoming set to true? Generally, if you have ACLs enabled, mTLS doesn’t provide any extra layers of security, so I’d recommend turning it off and just keeping “regular”, one-way TLS plus ACLs turned on.

Hi,
Thanks for reply. Yes my consul server is configured with verify_incoming = true. And normally it should work with such config or at least I think so :slight_smile: . After setting manageSystemACLs: false my K8S clients joined the cluster however I still can’t enable connectInject because of same reason. Init container named get-auto-encrypt-client-ca of consul-connect-injector-webhook pod is not able to get the client CA and failing with same error:

[ERROR] Error retrieving CA roots from Consul: err="Get "https://consul server:8501/v1/agent/connect/ca/roots": remote error: tls: bad certificate"

Inspecting the init container shows that the command is running and produces mentioned error message:

consul-k8s get-consul-client-ca -output-file=/consul/tls/client/ca/tls.crt -server-addr=consulserver -server-port=8501 -ca-file=/consul/tls/ca/tls.crt

I pretty much sure that if I will set verify_incoming = false on my consul server this will solve the issue however according the documentation it’s recommended to keep it enabled.

Currently my config yaml looks so:

global:
  enabled: false
  name: consul
  image: "consul:1.8.1"
  datacenter: consul-azure-dc
  acls:
    manageSystemACLs: false
    bootstrapToken:
      secretName: consul-acl-token  
      secretKey: bootstrap-token  
  gossipEncryption:
    secretName: consul-gossip-encryption
    secretKey: gossip
  tls:
    enabled: true
    enableAutoEncrypt: true
    caCert:
      secretName: consul-ca-cert
      secretKey: tls.crt

externalServers:
  enabled: true
  hosts: ["redacted"]
  k8sAuthMethodHost: redacted:443
  useSystemRoots: false

client:
  enabled: true
  join: ["redacted"]
  resources:
    requests:
      memory: "100Mi"
      cpu: "100m"
    limits:
      memory: "100Mi"
      cpu: "100m" 

connectInject:
  enabled: true
  default: false
  resources:
    requests:
      memory: "50Mi"
      cpu: "50m"
    limits:
      memory: "50Mi"
      cpu: "50m"
  centralConfig:
    enabled: false

Thanks

Hey @andriktr,

It looks like you are getting the same TLS error: bad certificate.

When you have verify_incoming set on the server, it will expect all clients, RPC or HTTP, to present a client certificate and will reject any connections that don’t do so.

In this case, our helm chart does not support servers that require HTTP clients to present client certs. So when some components are talking to the server over HTTPS, they don’t present a certificate, and you’re seeing the bad certificate error in your logs.

To fix your problem, I recommend turning ACLs in the helm chart back on and setting verify_incoming_https on the server to false, but verify_incoming_rpc to true. That way you can still ensure only trusted Consul clients are joining the cluster, but for HTTPS connections you don’t need that since you’re using ACLs already.

Hi @ishustava ,
Thanks for the answer.
According the helm chart reference in consul documentation manageSystemACLs: false should only be enabled if consul servers running inside the kubernetes.


In my case consul servers are outside the Kubernetes installed on a separate VM’s.

Also as far as I understand command consul-k8s get-consul-client-ca -output-file=/consul/tls/client/ca/tls.crt -server-addr=consulserver -server-port=8501 -ca-file=/consul/tls/ca/tls.crt executed in init container exactly tries to retrieve the client certificate which is expected in case of auto_encrypt mode.

Just want to clarify one more time that the problem here is not with consul client pods because they are up, running and joined to the cluster. The problem is with consul-connect-injector-webhook pod:

Concluding your answer it’s sounds that it’s not possible to appropriately join K8S consul clients to outside consul cluster and have verify_incoming enabled on consul servers together with

connectInject:
  enabled: true

on K8S side

If so IMO this most probably should be mentioned in consul documentation Secure Consul Agents section or in Consul Servers Outside of Kubernetes section.

Thank you.

Apologies @andriktr, these docs were out of date. It has now been fixed. I think you already saw that we have more docs on how to enable ACLs with external servers here.

Also as far as I understand command consul-k8s get-consul-client-ca -output-file=/consul/tls/client/ca/tls.crt -server-addr=consulserver -server-port=8501 -ca-file=/consul/tls/ca/tls.crt executed in init container exactly tries to retrieve the client certificate which is expected in case of auto_encrypt mode.

That’s not quite true. It’s retrieving the CA that has signed the Consul client’s certificate.

Concluding your answer it’s sounds that it’s not possible to appropriately join K8S consul clients to outside consul cluster and have verify_incoming enabled on consul servers together with

You are correct. I’ll add it to our docs, thanks for pointing this out!

Hi again @ishustava thanks for clarification.

That’s not quite true. It’s retrieving the CA that has signed the Consul client’s certificate.

But the CA’s certificate is already provided in the command in the argument-ca-file=/consul/tls/ca/tls.crt

This certificate exist in the container in the mentioned path /consul/tls/ca/tls.crt and it comes from Kubernetes secret referenced in helm chart tls configuration:

tls:
    enabled: true
    enableAutoEncrypt: true
    caCert:
      secretName: consul-ca-cert
      secretKey: tls.crt

What kind of CA it tries to retrieve then?

Also here some discussions about consul-k8s get-consul-client-ca.

Thank you.

So there’s two CA’s in play. There’s the CA for the servers and then when auto-encrypt is enabled, there’s a separate CA for the consul clients.

What we need to do is talk to the server to get the separate client CA. To do that we use the get-consul-client-ca command. However to make the call to the server, we need the server CA. Thats what’s being passed in by -ca-file=/consul/tls/ca/tls.crt. Then for subsequent calls to the Consul clients we need to use the client CA.

1 Like

@lkysow Thanks for clarification now it’s clear.

The last thing I would like to figure out is the

  acls:
    manageSystemACLs: false

Should I enable it if my consul servers are outside the cluster. Description in the documentation is really confusing me:

Thanks in advance

Hi, yes you can enable this if you want ACLs. We have a PR to update the docs but it hasn’t been merged yet.

@ishustava @lkysow Thanks for the advices, really appreciate it.

2 Likes

Hey @ishustava @lkysow maybe you will also have some insights on my another issue with setting up consul client on K8S. The problem is that if I set acl default policy to deny K8S consul client can’t join the cluster. Client container logs show the following:

==> Starting Consul agent...
           Version: '1.8.3+ent'
           Node ID: '5a64aa6b-178f-f732-d862-fe8482e0fbe6'
         Node name: 'redacted'     
        Datacenter: 'consul-azure-dc' (Segment: '')       
            Server: false (Bootstrap: false)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: 8501, gRPC: 8502, DNS: 8600)
      Cluster Addr: redacted (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: true, TLS-Outgoing: true, TLS-Incoming: false, Auto-Encrypt-TLS: true
==> Log data will now stream in as it occurs:
    2020-08-17T07:44:47.304Z [INFO]  agent.client.serf.lan: serf: EventMemberJoin: redacted redacted
    2020-08-17T07:44:47.304Z [WARN]  agent.client.manager: No servers available
    2020-08-17T07:44:47.514Z [WARN]  agent.client: AutoEncrypt failed: error="rpcinsecure error making call: rpcinsecure error making call: ACL not found"
    2020-08-17T07:44:47.522Z [WARN]  agent.client: AutoEncrypt failed: error="rpcinsecure error making call: rpcinsecure error making call: ACL not found"
    2020-08-17T07:44:47.595Z [WARN]  agent.client: AutoEncrypt failed: error="rpcinsecure error making call: ACL not found"
    2020-08-17T07:44:47.595Z [WARN]  agent.client: retrying AutoEncrypt: retry_interval=31.706936702s

Here is the my Helm chart config.yaml

  enabled: false
  name: consul
  image: "hashicorp/consul-enterprise:1.8.3-ent"
  datacenter: consul-azure-dc
  acls:
    manageSystemACLs: true
    bootstrapToken:
      secretName: consul-acl-token  
      secretKey: bootstrap-token  
  gossipEncryption:
    secretName: consul-gossip-encryption
    secretKey: gossip
  tls:
    enabled: true
    enableAutoEncrypt: true
    verify: true
    httpsOnly: false
    caCert:
      secretName: consul-ca-cert
      secretKey: tls.crt
externalServers:
  enabled: true
  hosts: ["redacted"]
  k8sAuthMethodHost: redacted:443
  useSystemRoots: false
client:
  enabled: true
  join: ["redacted"]
  resources:
    requests:
      memory: "100Mi"
      cpu: "100m"
    limits:
      memory: "100Mi"
      cpu: "100m" 
connectInject:
  enabled: true
  default: false
  resources:
    requests:
      memory: "50Mi"
      cpu: "50m"
    limits:
      memory: "50Mi"
      cpu: "50m"
  centralConfig:
    enabled: false
syncCatalog:
  enabled: false
  toConsul: false
  toK8S: false
  default: false 
  k8sPrefix: null
  k8sAllowNamespaces: ["*"]
  k8sDenyNamespaces: ["kube-system", "kube-public", "kube-node-lease", "default", "kubernetes-dashboard", "velero", "kured"]

Here is consul.hcl file on the consul server:

datacenter = "consul-azure-dc"
retry_join = ["redacted"]
retry_join_wan = ["redacted"] 
data_dir = "/opt/consul"
encrypt = "redacted"
ca_file = "/etc/consul.d/consul-agent-ca.pem"
cert_file = "/etc/consul.d/consul-azure-dc-server-consul-0.pem"   
key_file = "/etc/consul.d/consul-azure-dc-server-consul-0-key.pem"
verify_incoming = false
verify_incoming_rpc = true
verify_outgoing = true
verify_server_hostname = true
acl = {
  enabled = true
  default_policy = "deny"
  enable_token_persistence = true
  tokens = {
    agent = "redacted"
  }
}
auto_encrypt {
  allow_tls = true
}
performance {
  raft_multiplier = 1
}
ports {
  https = 8501
}

Here is the part of the same consul server configuration separated to the server.hcl:

server = true
bootstrap_expect = 3
ui = true
client_addr = "0.0.0.0"
connect {
  enabled = true
}
primary_datacenter = "consul-azure-dc"

Thanks in advance!!!