TLS server issues (Consul-k8s, vault secrets backend)

Hi all,

We are having some trouble getting ACLs to work properly. When we turn off manageSystemACLs consul starts fine and there are no issues. When we turn it on we are getting a whole bunch of errors. The weirdest one is that server-acl-init fails to connect to consul-server. We are using the vault secrets backend to generate the certificates.

server-acl-init pod:

{"@level":"error","@message":"Failure: calling /agent/self to get datacenter","@timestamp":"2022-05-25T08:50:47.768244Z","err":"Get \"https://global-consul-consul-server-0.global-consul-consul-server.default.svc:8501/v1/agent/self\": x509: certificate signed by unknown authority"}

We are writing the consul-ca secret before applying the helm chart. the value of consul-ca is:

apiVersion: v1
data:
  ca.crt: <secret>
kind: Secret
metadata:
  name: consul-ca
  namespace: default
type: Opaque

We created two PKI mounts:

pki/root-pki
pki/intermediate-pki

We followed the documentation here:

But it didn’t mention the CA PKI at all. It does mention it here: Generate mTLS Certificates for Consul with Vault | Consul - HashiCorp Learn

Instead of pki and pki_int we have:

pki/root-pki
pki/intermediate-pki

The value here is the same as we have in vault (pki/intermediate-pki/cert/ca).

The role we pass to the values (consul-global) has the following permissions:

data "vault_policy_document" "consul" {
  rule {
    path         = "secret/data/consul/*"
    capabilities = ["read", "update", "list"]
    description  = "Allow access to consul secrets"
  }
  rule {
    path         = "secret/consul/*"
    capabilities = ["read", "update", "list"]
    description  = "Allow access to consul secrets"
  }
  rule {
    path         = "/sys/mounts"
    capabilities = [ "read" ]
    description  = "allow all on /pki"
  }
  rule {
    path         = "/sys/mounts/pki/root-pki"
    capabilities = [ "read" ]
    description  = "allow all on /pki"
  }
  rule {
    path         = "/sys/mounts/pki/intermediate-pki"
    capabilities = [ "read" ]
    description  = "allow all on /pki"
  }
  rule {
    path         = "/pki/root-pki/"
    capabilities = [ "read" ]
    description  = "Read root PKI"
  }
  rule {
    path         = "/pki/root-pki/root/sign-intermediate"
    capabilities = [ "update" ]
    description  = "allow all on /pki"
  }
  rule {
    path         = "/pki/intermediate-pki/*"
    capabilities = [ "create", "read", "update", "delete", "list" ]
    description  = "allow all on /pki"
  }
  rule {
    path         = "auth/token/renew-self"
    capabilities = [ "update" ]
    description  = "renew own token"
  }

  rule {
    path         = "auth/token/lookup-self"
    capabilities = [ "read" ]
    description  = "Read own token"
  }

}

Our helm values:

global:
  acls:
    bootstrapToken:
      secretKey: bootstraptoken
      secretName: secret/consul/global
    manageSystemACLs: true
    replicationToken:
      secretKey: replicationtoken
      secretName: secret/consul/global
  datacenter: global
  enabled: true
  gossipEncryption:
    autoGenerate: false
    secretKey: gossipkey
    secretName: secret/consul/global
  logJSON: true
  secretsBackend:
    vault:
      connectCA:
        address: https://vault.somehost.com
        authMethodPath: kubernetes
        intermediatePKIPath: pki/intermediate-pki
        rootPKIPath: pki/root-pki
      consulCARole: consul-global
      consulClientRole: consul-global
      consulServerRole: consul-global
      manageSystemACLsRole: consul-global
      enabled: true

client:
  enabled: true
  extraVolumes:
  - load: "false"
    name: consul-ca
    type: secret
  grpc: true

  tls:
    caCert:
      secretName: pki/intermediate-pki/cert/ca
    enableAutoEncrypt: true
    enabled: true
    httpsOnly: false
    serverAdditionalDNSSANs:
    - '*.app.host'
    serverAdditionalIPSANs: []
    verify: true

connectInject:
  default: false
  enabled: true
  k8sAllowNamespaces:
  - staging
  - production

server:
  bootstrapExpect: 3
  connect: true
  disruptionBudget:
    maxUnavailable: 0
  enabled: true
  extraVolumes:
  - load: "false"
    name: consul-ca
    type: secret
  replicas: 3
  resources:
    limits:
      cpu: 100m
      memory: 100Mi
    requests:
      cpu: 20m
      memory: 20Mi
  serverCert:
    secretName: pki/intermediate-pki/issue/consul-global
  updatePartition: 0

I am really not sure where to go from here or where I made a mistake.

Hi @thecodeassassin!

I do not see a manageSystemACLsRole defined in your values.yaml.
If you follow the docs for partition, replication and bootstrap token setup (Storing the ACL Bootstrap Token in Vault | Consul by HashiCorp) and the problem still persists let me know!

Hi kschoche,

It’s there I just forgot to copy paste it. what’s weird is that now I’m getting no more errors when server-acl-init is run but I’m getting it on a single consul-client pod:

2022-05-25T14:57:09.387Z [ERROR] agent.auto_config: AutoEncrypt.Sign RPC failed: addr=10.244.0.242:8300 error="rpcinsecure error making call: rpcinsecure error making call: EOF"
2022-05-25T14:57:09.663Z [ERROR] agent.auto_config: AutoEncrypt.Sign RPC failed: addr=10.244.1.253:8300 error="rpcinsecure error making call: rpcinsecure error making call: EOF"
2022-05-25T14:57:10.172Z [ERROR] agent.auto_config: AutoEncrypt.Sign RPC failed: addr=10.244.1.103:8300 error="rpcinsecure error establishing connection: x509: certificate signed by unknown authority"
2022-05-25T14:57:10.172Z [ERROR] agent.auto_config: No servers successfully responded to the auto-encrypt request

Even weirder, it just randomly seems to have started to work:

2022-05-25T15:26:10.052Z [ERROR] agent.auto_config: AutoEncrypt.Sign RPC failed: addr=10.244.1.253:8300 error="rpcinsecure error making call: rpcinsecure error making call: EOF"
2022-05-25T15:26:10.518Z [INFO]  agent.auto_config: automatically upgraded to TLS
2022-05-25T15:26:10.520Z [INFO]  agent.client.serf.lan: serf: EventMemberJoin: general-small-cg95w 10.244.0.185
2022-05-25T15:26:10.520Z [INFO]  agent.router: Initializing LAN area manager
2022-05-25T15:26:10.520Z [INFO]  agent.auto_config: auto-config started