Installing consul on eks

I have installed consul agent only on a private EKS cluster on AWS and for some reason it tries to access the servers with the local ip.
I have deployed 3 consul servers on aws instances.
I can see my eks nodes in consul ui, but can’t register them as services.
I tried with all the values (as well as with the values that are commented) and i don’t understand what the problem can be.
The sync catalog pod crashes all the time.
This is the logs output:

kubectl logs consul-consul-sync-catalog-65c5b98694-lnclk -n consul
2023-08-02T19:44:52.493Z [INFO]  K8s namespace syncing configuration: k8s namespaces allowed to be synced="Set{*}" k8s namespaces denied from syncing="Set{kube-system, kube-public}"
Listening on ":8080"...
2023-08-02T19:44:52.589Z [INFO]  to-consul/source: starting runner for endpoints
2023-08-02T19:44:52.689Z [INFO]  to-k8s/sink: starting runner for syncing
2023-08-02T19:44:52.890Z [INFO]  to-consul/source: upsert: key=ingress-nginx/ingress-nginx-controller
2023-08-02T19:44:52.890Z [INFO]  to-k8s/sink: upsert: key=consul/consul-consul-dns
2023-08-02T19:44:52.894Z [INFO]  to-consul/source: upsert: key=ingress-nginx/ingress-nginx-controller-admission
2023-08-02T19:44:52.989Z [INFO]  to-consul/source: upsert: key=monitoring/prometheus-prometheus-node-exporter
2023-08-02T19:44:52.994Z [INFO]  to-consul/source: upsert: key=monitoring/prometheus-kube-prometheus-operator
2023-08-02T19:44:52.997Z [INFO]  to-consul/source: upsert: key=default/kubernetes
2023-08-02T19:44:52.997Z [INFO]  to-consul/source: upsert endpoint: key=ingress-nginx/ingress-nginx-controller-admission
2023-08-02T19:44:52.997Z [INFO]  to-consul/source: upsert endpoint: key=monitoring/prometheus-kube-prometheus-operator
2023-08-02T19:44:52.997Z [INFO]  to-consul/source: upsert endpoint: key=default/kubernetes
2023-08-02T19:44:52.997Z [INFO]  to-consul/source: upsert endpoint: key=monitoring/prometheus-prometheus-node-exporter
2023-08-02T19:44:53.003Z [INFO]  to-consul/source: upsert: key=monitoring/prometheus-kube-prometheus-alertmanager
2023-08-02T19:44:53.007Z [INFO]  to-consul/source: upsert: key=monitoring/prometheus-operated
2023-08-02T19:44:53.010Z [INFO]  to-consul/source: upsert: key=monitoring/alertmanager-operated
2023-08-02T19:44:53.014Z [INFO]  to-consul/source: upsert: key=monitoring/prometheus-kube-prometheus-prometheus
2023-08-02T19:44:53.089Z [INFO]  to-consul/source: upsert: key=monitoring/prometheus-grafana
2023-08-02T19:44:53.194Z [INFO]  to-consul/source: upsert: key=monitoring/prometheus-kube-state-metrics
2023-08-02T19:44:53.396Z [INFO]  to-consul/source: upsert: key=consul/consul-consul-dns
[GET /health/ready] Error getting leader status: Get "": dial tcp connect: connection refused
[GET /health/ready] Error getting leader status: Get "": dial tcp connect: connection refused
[GET /health/ready] Error getting leader status: Get "": dial tcp connect: connection refused
[GET /health/ready] Error getting leader status: Get "": dial tcp connect: connection refused

This is my values file content:

  enabled: false
  image: "hashicorp/consul:1.14.0"
  datacenter: opsschool
    secretName: consul-gossip-encryption-key
    secretKey: key
#  enabled: true
#  hosts: [consul-consul-dns.consul.svc]
#  httpPort: 8500
  enabled: false
  enabled: true
  exposeGossipPorts: true
    - "provider=aws tag_key=Consul tag_value=server"
#  nodeMeta:
#    pod-name: ${HOSTNAME}
#    host-ip: ${HOST_IP}
  enabled: true
  enabled: false

  enabled: true
  image: hashicorp/consul-k8s-control-plane:0.49.5 

This is how my configmap looks like:

kubectl describe cm coredns -n kube-system
Name:         coredns
Namespace:    kube-system
Annotations:  <none>

.:53 {
    kubernetes cluster.local {
      pods insecure
    prometheus :9153
    forward . /etc/resolv.conf
    cache 30
consul {
  cache 30
  forward .


Events:  <none>

Hi @ac171,

There are few things wrong with your values.yaml file. If I understand your set up correctly, your Consul Servers are on VMs (outside K8S), and you are trying to install a Client-Only K8S cluster. Based on this understanding, the following are what needs fixing:

  1. You have to set externalServers when the Servers are outside K8S. From the commented block, you seems to have tried with a K8S DNS name. However, your externalServers.hosts should have the the IPs of our VMs.

  2. The option externalServers.httpPort is wrong, it should be externalServers.httpsPort

After you set the externalServers settings properly and upgrade, your syncCatalog service would start to talk to the external servers and start syncing.

One other thing to note is, from Consul-K8S 1.0, the Consul Client agents are not required on K8S. Unless you have a specific reason, you may want to disable the clients. The clients are not in use from Consul-K8S 1.0.

You can read more about this change here: Consul 1.14 Beta: Announcing Simplified Service Mesh Deployments

Ok thank i will check that.
What about networking side?
I opened specific ports and than i opened all traffic just to check if this is my problem, and i still have connection refused error.
I have 3 consul servers on AWS and want to deploy consul agent on my EKS nodes using helm.
I have configured the consul service ip address in the coredns configmap.
Don’t i need to use the consul service dns name?

Hi @ac171,

I don’t know whether you managed to get this working. Based on the setup you are going for, you can simplify the values file to something as shown below.

  enabled: false
  datacenter: opsschool
  enabled: false
  enabled: true
  hosts: ["Your ec2 Consul VM IP or DNS names that can resolve to those IPs"]
  httpsPort: 8500
  enabled: true
  enabled: true

When you configure externalServers, your syncCatalog pod will start talking directly to the Consul Servers for syncing services.

Please note that the above is not a production-grade config from a security pov. You are not enabling ACLs, TLS that is recommended for production deployments.

Here is a demo of how it looks like: Consul-K8S Catalog Sync with External Servers - asciinema

1 Like

I am not sure why it works for you and not for me :frowning_face:
This is the error i get - i have opened ports 53 and 8500-8502 for both consul servers and my eks nodes:

kubectl logs consul-consul-sync-catalog-55b566fd9b-2qvd2 -n consul
2023-08-22T19:53:04.449Z [INFO]  consul-server-connection-manager: trying to connect to a Consul server
2023-08-22T19:53:04.749Z [INFO]  consul-server-connection-manager: discovered Consul servers: addresses=[]
2023-08-22T19:53:04.750Z [INFO]  consul-server-connection-manager: current prioritized list of known Consul servers: addresses=[]
2023-08-22T19:53:14.751Z [ERROR] consul-server-connection-manager: connection error: error="fetching supported dataplane features: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp i/o timeout\""
2023-08-22T19:53:15.199Z [INFO]  consul-server-connection-manager: trying to connect to a Consul server
2023-08-22T19:53:15.203Z [INFO]  consul-server-connection-manager: discovered Consul servers: addresses=[]
2023-08-22T19:53:15.203Z [INFO]  consul-server-connection-manager: current prioritized list of known Consul servers: addresses=[]

This is what i have configured in coredns configmap:
consul {
cache 30
forward . $(kubectl get svc -n consul consul-consul-dns -o jsonpath={.spec.clusterIP})

This is the result:
kubectl get svc -n consul consul-consul-dns -o jsonpath={.spec.clusterIP}

Maybe this can help as well:

kubectl describe deployment.apps/consul-consul-sync-catalog -n consul
Name:                   consul-consul-sync-catalog
Namespace:              consul
CreationTimestamp:      Tue, 22 Aug 2023 22:48:18 +0300
Labels:                 app=consul
Annotations:   1
Selector:               app=consul,chart=consul-helm,component=sync-catalog,release=consul
Replicas:               1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=consul
  Annotations: false
  Service Account:  consul-consul-sync-catalog
    Image:      hashicorp/consul-k8s-control-plane:1.2.1
    Port:       <none>
    Host Port:  <none>
      consul-k8s-control-plane sync-catalog \
        -log-level=info \
        -log-json=false \
        -k8s-default-sync=true \
        -consul-domain=consul \
        -allow-k8s-namespace="*" \
        -deny-k8s-namespace="kube-system" \
        -deny-k8s-namespace="kube-public" \
        -k8s-write-namespace=${NAMESPACE} \
        -node-port-sync-type=ExternalFirst \
        -consul-node-name=k8s-sync \
        -add-k8s-namespace-suffix \

      cpu:     50m
      memory:  50Mi
      cpu:      50m
      memory:   50Mi
    Liveness:   http-get http://:8080/health/ready delay=30s timeout=5s period=5s #success=1 #failure=3
    Readiness:  http-get http://:8080/health/ready delay=10s timeout=5s period=5s #success=1 #failure=5
      CONSUL_ADDRESSES:    consul-consul-dns.consul.svc
      CONSUL_GRPC_PORT:    8502
      CONSUL_HTTP_PORT:    8501
      CONSUL_DATACENTER:   opsschool
      NAMESPACE:            (v1:metadata.namespace)
    Mounts:                <none>
  Volumes:                 <none>
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    False   ProgressDeadlineExceeded
OldReplicaSets:  <none>
NewReplicaSet:   consul-consul-sync-catalog-55b566fd9b (1/1 replicas created)
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  20m   deployment-controller  Scaled up replica set consul-consul-sync-catalog-55b566fd9b to 1