Unable to use auto-encrypt feature in secondary consul cluster

Hello,

I have a federated consul cluster, primary cluster on VM and secondary cluster on k8s.

I have installed my custom CA and verified that the services on primary cluster are able to communicate using the envoy proxy upstream and downstream configurations.

I now use the same consul CA cert and key in k8s cluster (by creating a kubernetes secret).

I have following features enabled:

global:
  datacenter: dc-k8s
  image: "hashicorp/consul:1.9.3"
  imageK8S: "hashicorp/consul-k8s:0.24.0"
  enablePodSecurityPolicies: true
  name: "nse"
  tls:
    enabled: true
    verify: true
    enableAutoEncrypt: true
    httpsOnly: true
    caCert:
      secretName: consul-ca-cert
      secretKey: tls.crt
    caKey:
      secretName: consul-ca-key
      secretKey: tls.key

  federation:
    enabled: true

  gossipEncryption:
     secretName: consul-encrypt-key
     secretKey: gossipEncryptionKey

server:
  enabled: true
  storageClass: local-storage          
  extraConfig: |
    {"primary_datacenter": "dc-vm", "primary_gateways":["<primary_mesh_gateway_ip>:8443"]}

connectInject:
  enabled: true
  default: true

controller:
  enabled: true

meshGateway:
  enabled: true
  replicas: 1
  service:
    enabled: true
    type: NodePort
    nodePort: 30002
  consulServiceName: "mesh-gateway"

Despite maintaining above configs, I see below error in my consul client logs

[ERROR] agent.auto_config: AutoEncrypt.Sign RPC failed: addr=10.244.81.135:8300 error="rpcinsecure error making call: rpcinsecure error making call: CA is uninitialized and unable to sign certificates yet: no root certificate"

Why does it occur?

Consul: 1.9.3 in both DC
Consul-helm: 0.30.0
envoy: 1.16.0

The secondary consul cluster comes up perfectly fine, if i disable auto-encrypt feature… but then it cannot federate with the primary cluster where auto_encrypt is enabled.

What can be the issue ?

Hi @ashwinkupatkar thanks for filing this issue.

Could you give helm 0.31.0 a try on a test environment? We put in a fix to that release which fixed an issue with certs when using auto-encrypt. If that doesn’t fix it we should probably open up a ticket on github?

1 Like

Thanks @kschoche … will check on it.

Hi @kschoche,

I upgraded to consul helm chart version 0.31.1 but still the issue persists.

Please find attached events from the deployment

consul-events.txt (11.3 KB)

The client pod gets the below log in output:

2021-04-27T20:31:08.911Z [ERROR] agent.auto_config: AutoEncrypt.Sign RPC failed: addr=10.244.81.162:8300 error="rpcinsecure error making call: rpcinsecure error making call: CA is uninitialized and unable to sign certificates yet: no root certificate"
2021-04-27T20:31:08.911Z [ERROR] agent.auto_config: No servers successfully responded to the auto-encrypt request

The setup works perfectly fine, if i disable auto-encrypt feature… but then it cannnot federate with the primary cluster where auto_encrypt is enabled.

@kschoche can you provide me any reason for the issue based on above input ?

Hi @ashwinkupatkar - It’s a bit difficult to tell based on those logs alone, would you be able to also show your VM server configurations?
And also could you confirm that you’d followed the steps here for federation? Federation Between VMs and Kubernetes | Consul by HashiCorp ?
Thanks!

Hi @kschoche,

Yes, I followed the steps mentioned in Federation Between VMs and Kubernetes | Consul by HashiCorp

If you go through the document, you will find that enableAutoEncrypt is disabled in the doc.

So, if I also disable Auto Encrypt then everything works like a charm and Federation between these 2 clusters take place.

What I have observed is controller pod and the connect injector pod goes in the “init” phase and do not come up when Auto Encrypt is enabled.

Hi @ashwinkupatkar can you post the logs for those pods which are stuck? thanks!

I cannot get the logs for those pods as they are in the init stage and if I try to view the logs it says pod initializing.

Hi @kschoche, Could you reproduce the issue ?

Thanks

I am soo surprised to see that the issue resolved by itself. I did not do any changes in the config. I just enabled auto-encrypt on both primary and secondary cluster so that I can reproduce the issue today … but the setup just worked like a charm. I am Baffled.

Sorry for the delay @ashwinkupatkar. That’s great to hear. If you see issues, please let us know.

One thing that could have happened is that the CA was somehow mismatched. What I was going to ask you if you used the same CA for both Connect and Consul’s TLS (those could be different CAs). To clarify, connect CA is used for client certificates when using auto-encrypt and for mTLS between services in the service mesh. Consul servers could use a different CA and certificates for TLS to communicate to each other. Not sure if this was your issue though!

Hi @ishustava, In the seccondary cluster I have used the same ca cert and key as present in my primary cluster.