My configuration
I set up Consul on two Kubernetes clusters (let’s call them internal
and app-non-prod
) with Terraform, Helm Provider and this Helm chart. My values.yaml
on internal
cluster looks like this:
vaules.yaml
global:
name: consul
enablePodSecurityPolicies: true
image: consul:1.8.0-beta2
imageK8S: hashicorp/consul-k8s:0.15.0
tls:
enabled: true
enableAutoEncrypt: true
acls:
manageSystemACLs: true
createReplicationToken: true
gossipEncryption:
secretName: consul-gossip
secretKey: key
federation:
enabled: true
createFederationSecret: true
datadogAnnotations: &datadogAnnotations |
ad.datadoghq.com/consul.logs: '[{ "source":"consul", "service":"consul" }]'
ad.datadoghq.com/consul.init_configs: '[{}]'
ad.datadoghq.com/consul.check_names: '["consul"]'
ad.datadoghq.com/consul.instances: |
[{
"url": "https://%%host%%:8501",
"acl_token": "ENC[consul_acl_token]",
"tls_verify": false,
"tls_ignore_warning": true
}]
server:
enabled: true
extraConfig: |
{
"telemetry": {
"dogstatsd_addr": "127.0.0.1:8125"
}
}
annotations: *datadogAnnotations
affinity: |
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: {{ template "consul.name" . }}
release: "{{ .Release.Name }}"
component: server
topologyKey: kubernetes.io/hostname
bootstrapExpect: 3
connect: true
replicas: 3
resources: |
requests:
cpu: 10m
memory: 200Mi
limits:
cpu: 100m
memory: 600Mi
storage: 10Gi
client:
annotations: *datadogAnnotations
enabled: true
extraConfig: |
{
"telemetry": {
"dogstatsd_addr": "127.0.0.1:8125"
}
}
resources: |
requests:
cpu: 10m
memory: 200Mi
limits:
cpu: 100m
memory: 200Mi
dns:
enabled: true
ui:
enabled: true
connectInject:
enabled: true
centralConfig:
enabled: true
meshGateway:
enabled: true
globalMode: remote
resources: |
limits:
cpu: 100m
memory: 256Mi
requests:
cpu: 10m
memory: 128Mi
syncCatalog:
enabled: true
toConsul: false
toK8S: false
I also pass additional values via Terraform:
Terraform code
module "consul" {
source = "../helm-release"
helm_release_name = "consul"
helm_chart = "consul"
helm_version = "0.21.0"
helm_repository = "https://helm.releases.hashicorp.com"
namespace = var.namespace
create_namespace = true
wait = true
values = file("${path.module}/values.yaml")
set_values = {
"global.datacenter" = var.datacenter
"global.gossipEncryption.secretName" = kubernetes_secret.consul_gossip.metadata[0].name
"meshGateway.service.annotations" = "external-dns.alpha.kubernetes.io/hostname: ${var.datacenter_mesh_gateway_hostname}"
}
}
I followed these instructions and created:
-
static-client
oninternal
cluster with"consul.hashicorp.com/connect-service-upstreams": "static-server:1234:app-non-prod,static-server:1235"
annotation -
static-server
onapp-non-prod
cluster -
static-server
oninternal
cluster
I removed default (anonymous-token-policy
) permissions by changing it to an empty string.
I’m able to discover services between these clusters - consul-federation
secret works and I’m able to list all services and nodes in WAN Federation.
Problem with default ACLs
I found out that local (internal
to internal
) Consul connect connections work perfectly fine, however remote (internal
to app-non-prod
) Consul connections result in an error:
root@static-client:/# curl localhost:1235 # This is server from local DC
"hello world"
root@static-client:/# curl localhost:1234 # This is server from another DC
curl: (56) Recv failure: Connection reset by peer
I managed to connect successfully from internal
to app-non-prod
after changing anonymous-token-policy
to read-only:
node_prefix "" {
policy = "read"
}
service_prefix "" {
policy = "read"
}
Results are good:
root@static-client:/# curl localhost:1235 # This is server from local DC
"hello world"
root@static-client:/# curl localhost:1234 # This is server from another DC
"hello world"
Consul Connect began to work immediately (<3s) after I changed ACLs to allow anonymous read node/service access.
It makes me think that containers injected by connect-injector
use anonymous token to obtain information about services running in other clusters. Is this intended?
I’ve just began to work with Consul last week and maybe I missed something in docs. I assumed that manageSystemACLs: true
flag sets up all ACLs needed by injector, mesh gateways, default clients & servers and I know that logic hidden behind that flag definitely does the job for servers, clients and mesh gateways.
DNS
I also noticed that setting default ACL policy to empty value (as I wrote above) also results in no DNS entries being resolves. I followed Consul DNS - Kubernetes guide and there’s no mention of ACLs/tokens needed for DNS. I found this section in Production ACLs guide, however I’m pretty sure it should be handled “automagically” by manageSystemACLs: true
flag.
Consul Connect ACL Tokens
My last question/problem I experience is a long list of Consul Connect leftover login tokens:
Shouldn’t they be deregistered? When I run consul monitor
, I see:
2020-05-31T09:23:28.587Z [WARN] agent: Service deregistration blocked by ACLs: service=static-client-static-client accessorID=00000000-0000-0000-0000-000000000002
2020-05-31T09:23:28.589Z [ERROR] agent.client: RPC failed to server: method=Catalog.Deregister server=10.15.5.196:8300 error="rpc error making call: rpc error making call: Permission denied"
2020-05-31T09:23:28.589Z [WARN] agent: Service deregistration blocked by ACLs: service=static-client-static-client-sidecar-proxy accessorID=00000000-0000-0000-0000-000000000002
2020-05-31T09:23:28.590Z [ERROR] agent.client: RPC failed to server: method=Catalog.Deregister server=10.15.6.98:8300 error="rpc error making call: Permission denied"
I bet that as soon as I add write permissions, tokens will disappear. However it does not seem right to me - I believe that service deregistration should happen with accessorID
of a service (as on my screenshot) and not anonymously.