Consul Ingress Gateway ACL permissions on secondary DC

Hi,
I’m attempting to deploy Ingress Gateways through consul-helm on a secondary datacenter. The service-init container refuses to complete due to missing permissions:

Address "10.1.6.66" written to /tmp/address.txt successfully
Error registering service "ingress-gateway": Unexpected response code: 403 (Permission denied: Missing service:write on ingress-gateway)

I’ve confirmed that replication is working on the secondary DC servers. The following ACL (via the secret consul-ingress-gateway-ingress-gateway-acl-token) is configured with the appropriate policy. This was created by the server-init-acl job as part of the helm deployment and I can see that the policy is synced from the primary DC as well.

$ consul acl policy read -name ingress-gateway-ingress-gateway-token
ID:           1b245ea1-e799-612c-da5d-cdfffe6f4b50
Name:         ingress-gateway-ingress-gateway-token
Description:  ingress-gateway-ingress-gateway-token Token Policy
Datacenters:  
Rules:

  service "ingress-gateway" {
     policy = "write"
  }
  node_prefix "" {
    policy = "read"
  }
  service_prefix "" {
    policy = "read"
  }
$ consul acl token read -id 713fa743-2e91-12c4-3d7f-648420542872
AccessorID:       713fa743-2e91-12c4-3d7f-648420542872
SecretID:         <redacted, confirmed to match the k8s secret>
Description:      ingress-gateway-ingress-gateway-token Token
Local:            true
Create Time:      2021-05-28 15:55:30.376686108 +0000 UTC
Policies:
   1b245ea1-e799-612c-da5d-cdfffe6f4b50 - ingress-gateway-ingress-gateway-token

Any idea whats going on here?

PS: When the original policy was created by consul-k8s on the primary DC, only that single DC was listed under the policy’s “valid datacenters”. I manually patched it to be valid for all DCs so it would be picked up by the secondary DC.

_policy_id=$(sudo -E -u consul consul acl policy read -name ingress-gateway-ingress-gateway-token -format=json | jq -r '.ID')
_policy_rules=$(sudo -E -u consul consul acl policy read -name ingress-gateway-ingress-gateway-token -format=json | jq -r '.Rules')
_policy_description=$(sudo -E -u consul consul acl policy read -name ingress-gateway-ingress-gateway-token -format=json | jq -r '.Description')

jq --arg id "${_policy_id}" \
  --arg name "ingress-gateway-ingress-gateway-token" \
  --arg desc "${_policy_description}" \
  --arg rules "${_policy_rules}" \
  '{ID:$id, Name:$name, Description:$desc, Rules:$rules, Datacenters:[]}' <<< '{}' > policy-update.json

sudo -u consul curl -X PUT --data @policy-update.json \
  --header "X-Consul-Token: ${CONSUL_HTTP_TOKEN}" \
  --cacert ${CONSUL_CACERT} \
  --cert ${CONSUL_CLIENT_CERT} \
  --key ${CONSUL_CLIENT_KEY} \
  ${CONSUL_HTTP_ADDR}/v1/acl/policy/${_policy_id}

Hi Tom, I think the problem might be that the token is a local token which means it’s only valid in the current datacenter.

  1. Can you share your Helm values for both datacenters?
  2. Is there an ingress gateway in both datacenters?
  3. If so, do they have the same name?

Thanks!

Hey @lkysow, here’s the Helm values for each. Both ingress gateways are using the default names derived from the consul-helm chart. Is that the cause of this issue potentially (name conflicts on the consul side)? I ended up creating an identical policy and assigning it to the DC2 local token. That seems to have solved the problem.

The original problem seems to be related to the policy not getting applied to the local token in the secondary DC even after manually updating said policy to be valid in all datacenters (see the note in my original post above).

DC1 (azure-canadacentral)

USER-SUPPLIED VALUES:
client:
  enabled: true
  extraVolumes:
  - load: true
    name: consul-client-acl-config
    type: secret
  join:
  - azure-canadacentral-consul-server-0.poc.example.com
  - azure-canadacentral-consul-server-1.poc.example.com
  - azure-canadacentral-consul-server-2.poc.example.com
connectInject:
  aclInjectToken:
    secretKey: token
    secretName: consul-connect-token
  default: true
  enabled: true
  k8sDenyNamespaces:
  - aad-pod-identity
  - aqua-security
  - azureoperator-system
  - calico-system
  - cert-manager
  - cloudability
  - datadog
  - default
  - dynatrace
  - external-dns
  - gatekeeper-system
  - helm-operator
  - ingress
  - ingress-nginx
  - ingress-azure
  - istio-operator
  - istio-system
  - kured
  - splunk
externalServers:
  enabled: true
  hosts:
  - azure-canadacentral-consul-server-0.poc.example.com
  - azure-canadacentral-consul-server-1.poc.example.com
  - azure-canadacentral-consul-server-2.poc.example.com
  k8sAuthMethodHost: cluster1-dns-109d08d7.hcp.canadacentral.azmk8s.io
global:
  acls:
    bootstrapToken:
      secretKey: token
      secretName: bootstrap-token
    manageSystemACLs: true
  datacenter: azure-canadacentral
  enabled: false
  gossipEncryption:
    secretKey: key
    secretName: consul-gossip-encryption-key
  image: internalregistry.azurecr.io/vendor/consul:1.9.4
  imageEnvoy: internalregistry.azurecr.io/vendor/envoy-alpine:v1.16.0
  imageK8S: internalregistry.azurecr.io/vendor/consul-k8s:0.25.0
  name: consul
  tls:
    caCert:
      secretKey: tls.crt
      secretName: consul-ca-cert
    enableAutoEncrypt: true
    enabled: true
grafana:
  enabled: true
ingressGateways:
  defaults:
    service:
      annotations: |
        service.beta.kubernetes.io/azure-load-balancer-internal: "true"
      type: LoadBalancer
  enabled: true
prometheus:
  enabled: true

DC2 (azure-canadaeast)

USER-SUPPLIED VALUES:
client:
  enabled: true
  extraVolumes:
  - load: true
    name: consul-client-acl-config
    type: secret
  join:
  - azure-canadaeast-consul-server-0.poc.example.com
  - azure-canadaeast-consul-server-1.poc.example.com
  - azure-canadaeast-consul-server-2.poc.example.com
connectInject:
  aclInjectToken:
    secretKey: token
    secretName: consul-connect-token
  default: true
  enabled: true
  k8sDenyNamespaces:
  - aad-pod-identity
  - aqua-security
  - azureoperator-system
  - calico-system
  - cert-manager
  - cloudability
  - datadog
  - default
  - dynatrace
  - external-dns
  - gatekeeper-system
  - helm-operator
  - ingress
  - ingress-nginx
  - ingress-azure
  - istio-operator
  - istio-system
  - kured
  - splunk
externalServers:
  enabled: true
  hosts:
  - azure-canadaeast-consul-server-0.poc.example.com
  - azure-canadaeast-consul-server-1.poc.example.com
  - azure-canadaeast-consul-server-2.poc.example.com
  k8sAuthMethodHost: cluster2-dns-1a83f08b.hcp.canadaeast.azmk8s.io
global:
  acls:
    bootstrapToken:
      secretKey: token
      secretName: bootstrap-token
    manageSystemACLs: true
  datacenter: azure-canadaeast
  enabled: false
  gossipEncryption:
    secretKey: key
    secretName: consul-gossip-encryption-key
  image: internalregistry.azurecr.io/vendor/consul:1.9.4
  imageEnvoy: internalregistry.azurecr.io/vendor/envoy-alpine:v1.16.0
  imageK8S: internalregistry.azurecr.io/vendor/consul-k8s:0.25.0
  name: consul
  tls:
    caCert:
      secretKey: tls.crt
      secretName: consul-ca-cert
    enableAutoEncrypt: true
    enabled: true
grafana:
  enabled: true
ingressGateways:
  defaults:
    service:
      annotations: |
        service.beta.kubernetes.io/azure-load-balancer-internal: "true"
      type: LoadBalancer
  enabled: true
prometheus:
  enabled: true

Original policy:

$ sudo -E consul acl policy read -id 1b245ea1-e799-612c-da5d-cdfffe6f4b50
ID:           1b245ea1-e799-612c-da5d-cdfffe6f4b50
Name:         ingress-gateway-ingress-gateway-token
Description:  ingress-gateway-ingress-gateway-token Token Policy
Datacenters:  
Rules:

  service "ingress-gateway" {
     policy = "write"
  }
  node_prefix "" {
    policy = "read"
  }
  service_prefix "" {
    policy = "read"
  }

Duplicate policy:

$ sudo -E consul acl policy read -id 8fb6ee02-92c8-5e9d-1720-95f5b9b71d89
ID:           8fb6ee02-92c8-5e9d-1720-95f5b9b71d89
Name:         cae-ingress-gateway-policy
Description:  
Datacenters:  
Rules:

  service "ingress-gateway" {
     policy = "write"
  }
  node_prefix "" {
    policy = "read"
  }
  service_prefix "" {
    policy = "read"
  }

DC2 local token (non-working state, applied to original policy):

$ sudo -E consul acl token read -id 713fa743-2e91-12c4-3d7f-648420542872
AccessorID:       713fa743-2e91-12c4-3d7f-648420542872
SecretID:         <redacted>
Description:      ingress-gateway-ingress-gateway-token Token
Local:            true
Create Time:      2021-05-28 15:55:30.376686108 +0000 UTC
Policies:
   1b245ea1-e799-612c-da5d-cdfffe6f4b50 - ingress-gateway-ingress-gateway-token

$ klog consul-ingress-gateway-b87f8c5b4-qmxz9
[pod/consul-ingress-gateway-b87f8c5b4-qmxz9/get-auto-encrypt-client-ca] 2021-06-07T16:52:46.843789540Z Successfully wrote Consul client CA to: /consul/tls/client/ca/tls.crt
[pod/consul-ingress-gateway-b87f8c5b4-qmxz9/service-init] 2021-06-07T16:52:54.442175326Z Address "10.140.76.66" written to /tmp/address.txt successfully
[pod/consul-ingress-gateway-b87f8c5b4-qmxz9/service-init] 2021-06-07T16:52:57.439987072Z Error registering service "ingress-gateway": Unexpected response code: 403 (Permission denied: Missing service:write on ingress-gateway)

DC2 token (duplicate policy applied):

$ sudo -E consul acl token read -id 713fa743-2e91-12c4-3d7f-648420542872
AccessorID:       713fa743-2e91-12c4-3d7f-648420542872
SecretID:         <redacted>
Description:      ingress-gateway-ingress-gateway-token Token
Local:            true
Create Time:      2021-05-28 15:55:30.376686108 +0000 UTC
Policies:
   8fb6ee02-92c8-5e9d-1720-95f5b9b71d89 - cae-ingress-gateway-policy

# ingress gateway redeployed
$ klog consul-ingress-gateway-b87f8c5b4-mgzwg
[pod/consul-ingress-gateway-b87f8c5b4-mgzwg/get-auto-encrypt-client-ca] 2021-06-07T16:57:37.742160259Z Successfully wrote Consul client CA to: /consul/tls/client/ca/tls.crt
[pod/consul-ingress-gateway-b87f8c5b4-mgzwg/service-init] 2021-06-07T16:57:40.240892431Z Address "10.140.76.66" written to /tmp/address.txt successfully
[pod/consul-ingress-gateway-b87f8c5b4-mgzwg/service-init] 2021-06-07T16:57:43.039769299Z Registered service: ingress-gateway
[pod/consul-ingress-gateway-b87f8c5b4-mgzwg/ingress-gateway] 2021-06-07T16:57:46.442556577Z [2021-06-07 16:57:46.442][1][info][main] [source/server/server.cc:305] initializing epoch 0 (base id=0, hot restart version=disabled)
...

Yeah that’s the problem. Kubernetes will create a local ACL token for the ingress gateway and if it has the same name as a previous gateway then it will think it’s already created.

So looks like we require separate names for each gateway across dc’s.

I’ll create a ticket to document this and another ticket to error out during acl initialization.

Well actually if you’re using federation through Helm then this works: we suffix the policies with the datacenter names. But it looks like you’re doing federation outside of helm?