I have a federated cluster running in kubernetes across two different regions. I’m seeing the following over a couple minute span in the secondary data center. I don’t see any issue with the cluster as this is happening and there are no relevant leave or join events happening at the same time as these errors.
[ERROR] agent.anti_entropy: failed to sync remote state: error="ACL not found"
[ERROR] agent.anti_entropy: failed to sync remote state: error="ACL not found"
[ERROR] agent: Coordinate update error: error="rpc error making call: ACL not found"
[ERROR] agent.anti_entropy: failed to sync remote state: error="ACL not found"
[ERROR] agent: Coordinate update error: error="rpc error making call: ACL not found"
[ERROR] agent.anti_entropy: failed to sync remote state: error="ACL not found"
consul keyring -list
does not show any different information in either DC. I’ve run debug sessions and captured these errors, but there’s no supporting information as to what might be happening.
consul version 1.8.8
Amier
October 29, 2021, 8:29pm
2
Hey Duncan!
First off welcome to the consul community! Could you provide us with your config files for the primary and secondary data centers?
Also, how did you set the ACL tokens on your initial setup?
Hi @Amier thanks for getting back to me.
Primary helm config:
chart:
repository: https://helm.releases.hashicorp.com
version: v0.21.0
name: consul
values:
global:
datacenter: primary
image: "consul:1.8.8"
imageK8S: "hashicorp/consul-k8s:0.15.0"
imageEnvoy: "envoyproxy/envoy:v1.13.0"
enablePodSecurityPolicies: true
gossipEncryption:
secretName: consul-gossip-encryption-key
secretKey: key
tls:
enabled: true
verify: true
httpsOnly: true
caCert:
secretName: consul-ca-cert
secretKey: 'tls.crt'
caKey:
secretName: consul-ca-key
secretKey: 'tls.key'
acls:
manageSystemACLs: true
createReplicationToken: true
federation:
enabled: true
createFederationSecret: true
connectInject:
enabled: true
meshGateway:
enabled: true
imageEnvoy: "envoyproxy/envoy:v1.13.0"
wanAddress:
source: Static
static: 'consul-mesh'
service:
enabled: true
type: LoadBalancer
annotations: |-
external-dns-enabled: "true"
external-dns.alpha.kubernetes.io/hostname: "consul-mesh"
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
service.beta.kubernetes.io/aws-load-balancer-security-groups: "sg-aaa"
additionalSpec: |-
loadBalancerSourceRanges:
- "10.0.0.0/13"
server:
replicas: 5
updatePartition: 5
storage: 50Gi
annotations: |
"prometheus.io/scrape": "true"
"prometheus.io/port": "8501"
"prometheus.io/scheme": "https"
extraConfig: |
{
"telemetry": {
"disable_hostname": true,
"prometheus_retention_time": "6h"
},
"auto_encrypt": {
"allow_tls": true
}
}
client:
annotations: |
"prometheus.io/scrape": "true"
"prometheus.io/port": "8501"
"prometheus.io/scheme": "https"
extraConfig: |
{
"telemetry": {
"disable_hostname": true,
"prometheus_retention_time": "6h"
}
}
tolerations: |
- operator: Exists
syncCatalog:
enabled: true
toConsul: true
toK8S: false
k8sAllowNamespaces: ['default']
addK8SNamespaceSuffix: false
consulPrefix: 'kubernetes-'
Secondary helm config:
chart:
repository: https://helm.releases.hashicorp.com
version: v0.21.0
name: consul
values:
global:
datacenter: secondary
image: "consul:1.8.8"
imageK8S: "hashicorp/consul-k8s:0.15.0"
imageEnvoy: "envoyproxy/envoy:v1.13.0"
enablePodSecurityPolicies: true
# enable gossip encryption
gossipEncryption:
secretName: hashicorp-consul-federation
secretKey: gossipEncryptionKey
tls:
enabled: true
verify: true
httpsOnly: true
caCert:
secretName: hashicorp-consul-federation
secretKey: caCert
caKey:
secretName: hashicorp-consul-federation
secretKey: caKey
acls:
manageSystemACLs: true
replicationToken:
secretName: hashicorp-consul-federation
secretKey: replicationToken
federation:
enabled: true
connectInject:
enabled: true
meshGateway:
enabled: true
imageEnvoy: "envoyproxy/envoy:v1.13.0"
wanAddress:
source: Static
static: 'consul-mesh.b'
service:
enabled: true
type: LoadBalancer
annotations: |-
external-dns-enabled: "true"
external-dns.alpha.kubernetes.io/hostname: "consul-mesh.b"
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
service.beta.kubernetes.io/aws-load-balancer-security-groups: "sg-bbbb"
additionalSpec: |-
loadBalancerSourceRanges:
- "10.0.0.0/13"
ui:
enabled: true
service:
enabled: true
type: null
server:
replicas: 5
updatePartition: 5
storage: 50Gi
annotations: |
"prometheus.io/scrape": "true"
"prometheus.io/port": "8501"
"prometheus.io/scheme": "https"
extraConfig: |
{
"telemetry": {
"disable_hostname": true,
"prometheus_retention_time": "6h"
},
"auto_encrypt": {
"allow_tls": true
}
}
extraVolumes:
- type: secret
name: hashicorp-consul-federation
items:
- key: serverConfigJSON
path: config.json
load: true
client:
annotations: |
"prometheus.io/scrape": "true"
"prometheus.io/port": "8501"
"prometheus.io/scheme": "https"
extraConfig: |
{
"telemetry": {
"disable_hostname": true,
"prometheus_retention_time": "6h"
}
}
tolerations: |
- operator: Exists
syncCatalog:
enabled: true
toConsul: true
toK8S: false
k8sAllowNamespaces: ['default']
addK8SNamespaceSuffix: false
consulPrefix: 'kubernetes-'
Our documentation says that this was done when setting up federation:
consul keyring -install <federated-key> -token <bootstrap-token>
consul keyring -use <federated-key> -token <bootstrap-token>
consul keyring -remove <old-key> -token <bootstrap-token>
lkysow
November 5, 2021, 6:31pm
4
Hi, not sure the issue but the logs are about ACLs, not the consul keyring which is used for gossip encryption.
I think this error is saying that the secondary DC servers don’t have a valid ACL token set.
Amier
November 12, 2021, 9:04pm
5
@duncaan
I can’t find the specific PR, but I think this issue was fixed somewhere in between 1.8.8 and the latest version. I would recommend upgrading consul versions ( upgrading docs linked here ). If that doesn’t solve the issue I can try my hand at replicating it so we can figure out if this is a bug that’ll need to be fixed.
Thank you for your patience!