Consul multi-cluster service discovery

I am trying to setup consul multi-cluster with two local kind clusters [ one control-plane and one worker nodes] I have Dapr setup on each of the clusters and my applications running on each of the cluster. The intention is to get these applications to communicate over gRPC. The first step is to get the name resolution for services running across these clusters - I am trying multiple tutorials shared in different learning paths.
For the setup of datacenter I am getting an error in certificate creation for federated data centre - the option -additional-dnsname is erroring :

learn-consul-get-started-kubernetes % consul tls cert create -server -dc dca -domain consul -additional-dnsname=*.dcb.consul
zsh: no matches found: -additional-dnsname=*.dcb.consul 

I tried to run the instructions to setup multi-cluster with kind based on instructions at -

1.Consul Service Discovery and Mesh on Kubernetes in Docker (kind) | Consul | HashiCorp Developer

2.Enabling Service-to-service Traffic Across WAN Federated Datacenters | Consul | HashiCorp Developer

This setup deploys the primary datacenter, but secondary data centre is not recognised in ‘consul members’ output. I have build this setup on two azure VMs, each hosting a 3 node kind cluster [ one control-plane, 2 workers].

Help needed:

Is there an interactive lab or a consolidated set of instructions that I could use to setup 2 kind clusters and test http calls originating from a service in one cluster being routed to the other cluster.

Hi @chetlapalle.akhila,

When running on K8S, the TLS cert creation is taken care of by the Helm Charts. The command that is erroring for you is only required when you are deploying Consul on VMs. In addition, the error you are having is because zsh is interpreting * in your -additional-dnsname arg and is not an issue with the CLI.

Consul 1.14.0 introduced Cluster Peering that allows connecting multiple DCs easier than WAN Federation. If you don’t have a requirement to go with WAN federation specifically, I recommend you try the tutorial below on Cluster Peering.

The only change you may have to make is to properly expose the MeshGateway (according to the available networking options in your kind cluster) by modifying the following in the values file: Helm Chart Reference | Consul | HashiCorp Developer.

Hi @Ranjandas

Thanks for the help. I tried using cluster peering with consul to communicate between two services in two clusters and was able to do it successfully as well. My next step is to enable service discovery using Dapr, while using consul mesh in the background to communicate between two clusters.

Dapr offers service invocation between two services using it’s APIs and runs alongside application container as a sidecar: Service invocation overview | Dapr Docs. So for that, it requires annotating the application deployment with
dapr.io/enabled:"true". For the consul service mesh, it requires annotating the deployment with consul.hashicorp.com/connect-inject: "true". The combined deployment file looks like this:

kind: Service
apiVersion: v1
metadata:
  name: nodeapp
  labels:
    app: node
spec:
  selector:
    app: node
  ports:
  - protocol: TCP
    port: 80
    targetPort: 3000
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nodeapp
  labels:
    app: node
spec:
  replicas: 1
  selector:
    matchLabels:
      app: node
  template:
    metadata:
      labels:
        app: node
      annotations:
        consul.hashicorp.com/connect-inject: "true"
        dapr.io/enabled: "true"
        dapr.io/app-id: "nodeapp"
        dapr.io/app-port: "3000"
        dapr.io/enable-api-logging: "true"
        dapr.io/enable-app-health-check: "false"
    spec:
      containers:
      - name: node
        image: ghcr.io/dapr/samples/hello-k8s-node:latest
        env:
        - name: APP_PORT
          value: "3000"
        ports:
        - containerPort: 3000
        imagePullPolicy: Always

When I try to run the deployment with only one of the two flags dapr.io/enabled:true and consul.hashicorp.com/connect-inject: "true", the deployment runs fine. But when I enable both, the pods are always in PodInitializing state. What could be the way to fix this?

Hi @shivamkm07,

I don’t have any prior exposure to dapr, nor have I seen any integrations of dapr and consul so far. With that said, I just did a quick test on my side, and it looks like dapr creates a service with the name <servicename>-dapr, and Consul has a requirement that only one service should exist that points to the service that is connect injected.

For example, I tried with the HashiCorp Counting example service, and I saw this error in the consul-connect-inject-init

2023-01-20T06:31:55.761Z [INFO]  Unable to find registered services; retrying
2023-01-20T06:31:55.761Z [ERROR] There are multiple Consul services registered for this pod when there must only be one. Check if there are multiple Kubernetes services selecting this pod and add the label `consu
l.hashicorp.com/service-ignore: "true"` to all services except the one used by Consul for handling requests.

Applying the label, as shown below made the connect-injection to work, but the dapr container’s readiness and liveness probe was failing and this is due to the use of transparent proxy.

kubectl label svc counting-dapr consul.hashicorp.com/service-ignore="true"

To get this working, there are two options:

  1. Disable Transparent Proxy by setting consul.hashicorp.com/transparent-proxy: false (ref: Annotations and Labels | Consul | HashiCorp Developer)

  2. If you want to use the transparent proxy, exclude the dapr ports inbound using consul.hashicorp.com/transparent-proxy-exclude-inbound-ports: '3501' (ref: Annotations and Labels | Consul | HashiCorp Developer)

kubectl get event  --field-selector involvedObject.name=counting-b9d79d85f-psq7t
LAST SEEN   TYPE     REASON      OBJECT                         MESSAGE
13m         Normal   Scheduled   pod/counting-b9d79d85f-psq7t   Successfully assigned default/counting-b9d79d85f-psq7t to k3s
13m         Normal   Pulled      pod/counting-b9d79d85f-psq7t   Container image "hashicorp/consul:1.13.2" already present on machine
13m         Normal   Created     pod/counting-b9d79d85f-psq7t   Created container copy-consul-bin
13m         Normal   Started     pod/counting-b9d79d85f-psq7t   Started container copy-consul-bin
13m         Normal   Pulled      pod/counting-b9d79d85f-psq7t   Container image "hashicorp/consul-k8s-control-plane:0.49.0" already present on machine
13m         Normal   Created     pod/counting-b9d79d85f-psq7t   Created container consul-connect-inject-init
13m         Normal   Started     pod/counting-b9d79d85f-psq7t   Started container consul-connect-inject-init
13m         Normal   Pulled      pod/counting-b9d79d85f-psq7t   Container image "hashicorp/counting-service:0.0.2" already present on machine
13m         Normal   Created     pod/counting-b9d79d85f-psq7t   Created container counting
13m         Normal   Started     pod/counting-b9d79d85f-psq7t   Started container counting
13m         Normal   Pulled      pod/counting-b9d79d85f-psq7t   Container image "envoyproxy/envoy:v1.23.1" already present on machine
13m         Normal   Created     pod/counting-b9d79d85f-psq7t   Created container envoy-sidecar
13m         Normal   Started     pod/counting-b9d79d85f-psq7t   Started container envoy-sidecar
13m         Normal   Pulled      pod/counting-b9d79d85f-psq7t   Container image "docker.io/daprio/daprd:1.9.5" already present on machine
13m         Normal   Created     pod/counting-b9d79d85f-psq7t   Created container daprd
13m         Normal   Started     pod/counting-b9d79d85f-psq7t   Started container daprd

While this helps the pod start fully, I am unsure whether this would allow dapr to work. I hope you will be able to figure that part out.

I hope this helps.

Hi @Ranjandas, Thanks for giving your time resolving the issue. Unfortunately I couldn’t seem to get the pod started with the steps mentioned above. Detailing my steps below:

  • git clone git@github.com:hashicorp/demo-consul-101.git && cd k8s/04-yaml-connect-envoy

  • Added following annotations to the pod deployment:

        dapr.io/enabled: "true"
        dapr.io/app-id: "countingapp"
        dapr.io/app-port: "3000"
        dapr.io/enable-api-logging: "true"
        consul.hashicorp.com/transparent-proxy: "false"
  • k apply -f counting-service.yaml

  • k label svc countingapp-dapr consul.hashicorp.com/service-ignore="true"

Even after running above commands, the counting pod is in initializing state:

NAME                        READY   STATUS     RESTARTS      AGE
counting-54cdf66b77-2z75v   0/3     Init:0/1   4 (76s ago)   10m

Also, I don’t see the log There are multiple Consul services registered for this pod when there must only be one. Check.... in consul-connect-injector pod. Where is this log generated? When I added consul.hashicorp.com/service-ignore="true" label to counting app-dapr, following logs were generated in consul-connect-injector pod:

2023-01-23T12:17:27.275Z	INFO	controller.endpoints	retrieved	{"name": "countingapp-dapr", "ns": "default"}
2023-01-23T12:17:27.275Z	INFO	controller.endpoints	Ignoring endpoint labeled with `consul.hashicorp.com/service-ignore: "true"`	{"name": "countingapp-dapr", "namespace": "default"}
2023-01-23T12:17:27.276Z	INFO	controller.endpoints	deregistering service from consul	{"svc": "counting-54cdf66b77-2z75v-countingapp-dapr"}
2023-01-23T12:17:27.283Z	INFO	controller.endpoints	deregistering service from consul	{"svc": "counting-54cdf66b77-2z75v-countingapp-dapr-sidecar-proxy"}
2023-01-23T12:17:27.287Z	INFO	controller.endpoints	retrieved	{"name": "countingapp-dapr", "ns": "default"}
2023-01-23T12:17:27.288Z	INFO	controller.endpoints	registering service with Consul	{"name": "countingapp-dapr", "id": ""}
2023-01-23T12:17:27.311Z	INFO	controller.endpoints	registering proxy service with Consul	{"name": "countingapp-dapr-sidecar-proxy"}

Seemingly it re-registers the service after deregistering. Any possible reason behind it?
I had the consul installed with following values:

global:
  name: consul
  image: "hashicorp/consul:1.14.1"
  peering:
    enabled: true
  tls:
    enabled: true
meshGateway:
  enabled: true
connectInject:
    enabled: true
    default: false

Also Dapr was installed in the cluster beforehand.

It would help greatly if you could detail the steps to get the pod working or let me know what should I do to fix it. Thanks!

Hi @shivamkm07,

From my testing, it looks like the daprd container is constantly restarting and as mentioned before, I don’t know enough dapr to fix the issue.

Here are the annotations that I used from the same example you shared.

ubuntu@k3s:~/demo-consul-101/k8s/04-yaml-connect-envoy$ git diff
diff --git a/k8s/04-yaml-connect-envoy/counting-service.yaml b/k8s/04-yaml-connect-envoy/counting-service.yaml
index 1d5a742..ba76bc4 100644
--- a/k8s/04-yaml-connect-envoy/counting-service.yaml
+++ b/k8s/04-yaml-connect-envoy/counting-service.yaml
@@ -41,6 +41,11 @@ spec:
         service: counting
         app: counting
       annotations:
+        dapr.io/enabled: "true"
+        dapr.io/app-id: "countingapp"
+        dapr.io/app-port: "9001"
+        dapr.io/enable-api-logging: "true"
+        consul.hashicorp.com/kubernetes-service: "counting"
         consul.hashicorp.com/connect-inject: "true"
         consul.hashicorp.com/connect-service-upstreams: "dashboard:9002"
     spec:

I can see the following logs in the daprd container:

time="2023-01-25T09:54:33.743453327Z" level=info msg="application configuration loaded" app_id=countingapp instance=counting-7cbc7c46f6-2swzg scope=dapr.runtime type=log ver=1.9.5
time="2023-01-25T09:54:33.744644668Z" level=info msg="actors: state store is not configured - this is okay for clients but services with hosted actors will fail to initialize!" app_id=countingapp instance=counting-7cbc7c46f6-2swzg scope=dapr.runtime type=log ver=1.9.5
time="2023-01-25T09:54:33.745020907Z" level=info msg="actor runtime started. actor idle timeout: 1h0m0s. actor scan interval: 30s" app_id=countingapp instance=counting-7cbc7c46f6-2swzg scope=dapr.runtime.actor type=log ver=1.9.5
time="2023-01-25T09:54:33.745625067Z" level=info msg="dapr initialized. Status: Running. Init Elapsed 50ms" app_id=countingapp instance=counting-7cbc7c46f6-2swzg scope=dapr.runtime type=log ver=1.9.5
time="2023-01-25T09:54:33.746221736Z" level=debug msg="try to connect to placement service: dns:///dapr-placement-server.dapr-system.svc.cluster.local:50005" app_id=countingapp instance=counting-7cbc7c46f6-2swzg scope=dapr.runtime.actor.internal.placement type=log ver=1.9.5
time="2023-01-25T09:54:33.777206381Z" level=debug msg="established connection to placement service at dns:///dapr-placement-server.dapr-system.svc.cluster.local:50005" app_id=countingapp instance=counting-7cbc7c46f6-2swzg scope=dapr.runtime.actor.internal.placement type=log ver=1.9.5
time="2023-01-25T09:54:33.780328257Z" level=debug msg="placement order received: lock" app_id=countingapp instance=counting-7cbc7c46f6-2swzg scope=dapr.runtime.actor.internal.placement type=log ver=1.9.5
time="2023-01-25T09:54:33.780542195Z" level=debug msg="placement order received: update" app_id=countingapp instance=counting-7cbc7c46f6-2swzg scope=dapr.runtime.actor.internal.placement type=log ver=1.9.5
time="2023-01-25T09:54:33.780732961Z" level=info msg="placement tables updated, version: 0" app_id=countingapp instance=counting-7cbc7c46f6-2swzg scope=dapr.runtime.actor.internal.placement type=log ver=1.9.5
time="2023-01-25T09:54:33.780813163Z" level=debug msg="placement order received: unlock" app_id=countingapp instance=counting-7cbc7c46f6-2swzg scope=dapr.runtime.actor.internal.placement type=log ver=1.9.5
time="2023-01-25T09:54:49.312937795Z" level=info msg="dapr shutting down." app_id=countingapp instance=counting-7cbc7c46f6-2swzg scope=dapr.runtime type=log ver=1.9.5
time="2023-01-25T09:54:49.313134911Z" level=info msg="Stopping PubSub subscribers and input bindings" app_id=countingapp instance=counting-7cbc7c46f6-2swzg scope=dapr.runtime type=log ver=1.9.5
time="2023-01-25T09:54:49.313177471Z" level=info msg="Shutting down actor" app_id=countingapp instance=counting-7cbc7c46f6-2swzg scope=dapr.runtime type=log ver=1.9.5
time="2023-01-25T09:54:49.79355185Z" level=info msg="Stopping Dapr APIs" app_id=countingapp instance=counting-7cbc7c46f6-2swzg scope=dapr.runtime type=log ver=1.9.5
time="2023-01-25T09:54:49.796882232Z" level=info msg="Waiting 5s to finish outstanding operations" app_id=countingapp instance=counting-7cbc7c46f6-2swzg scope=dapr.runtime type=log ver=1.9.5
time="2023-01-25T09:54:54.797623642Z" level=info msg="Shutting down all remaining components" app_id=countingapp instance=counting-7cbc7c46f6-2swzg scope=dapr.runtime type=log ver=1.9.5

I can’t figure out why daprd is shutting down.

For additional information, I installed dapr on k8s using the following command:

ubuntu@k3s:~$ dapr init -k --wait --enable-mtls=false -n dapr-system
⌛  Making the jump to hyperspace...
ℹ️  Note: To install Dapr using Helm, see here: https://docs.dapr.io/getting-started/install-dapr-kubernetes/#install-with-helm-advanced

ℹ️  Container images will be pulled from Docker Hub
✅  Deploying the Dapr control plane to your cluster...
✅  Success! Dapr has been installed to namespace dapr-system. To verify, run `dapr status -k' in your terminal. To get started, go here: https://aka.ms/dapr-getting-started

I looked at dapr documentation to see the ports used inbound and outbound to tweak the deployment, but I couldn’t find any details about it.