Installing consul on eks

Hi,
I have installed consul agent only on a private EKS cluster on AWS and for some reason it tries to access the servers with the local ip.
I have deployed 3 consul servers on aws instances.
I can see my eks nodes in consul ui, but can’t register them as services.
I tried with all the values (as well as with the values that are commented) and i don’t understand what the problem can be.
The sync catalog pod crashes all the time.
This is the logs output:

kubectl logs consul-consul-sync-catalog-65c5b98694-lnclk -n consul
2023-08-02T19:44:52.493Z [INFO]  K8s namespace syncing configuration: k8s namespaces allowed to be synced="Set{*}" k8s namespaces denied from syncing="Set{kube-system, kube-public}"
Listening on ":8080"...
2023-08-02T19:44:52.589Z [INFO]  to-consul/source: starting runner for endpoints
2023-08-02T19:44:52.689Z [INFO]  to-k8s/sink: starting runner for syncing
2023-08-02T19:44:52.890Z [INFO]  to-consul/source: upsert: key=ingress-nginx/ingress-nginx-controller
2023-08-02T19:44:52.890Z [INFO]  to-k8s/sink: upsert: key=consul/consul-consul-dns
2023-08-02T19:44:52.894Z [INFO]  to-consul/source: upsert: key=ingress-nginx/ingress-nginx-controller-admission
2023-08-02T19:44:52.989Z [INFO]  to-consul/source: upsert: key=monitoring/prometheus-prometheus-node-exporter
2023-08-02T19:44:52.994Z [INFO]  to-consul/source: upsert: key=monitoring/prometheus-kube-prometheus-operator
2023-08-02T19:44:52.997Z [INFO]  to-consul/source: upsert: key=default/kubernetes
2023-08-02T19:44:52.997Z [INFO]  to-consul/source: upsert endpoint: key=ingress-nginx/ingress-nginx-controller-admission
2023-08-02T19:44:52.997Z [INFO]  to-consul/source: upsert endpoint: key=monitoring/prometheus-kube-prometheus-operator
2023-08-02T19:44:52.997Z [INFO]  to-consul/source: upsert endpoint: key=default/kubernetes
2023-08-02T19:44:52.997Z [INFO]  to-consul/source: upsert endpoint: key=monitoring/prometheus-prometheus-node-exporter
2023-08-02T19:44:53.003Z [INFO]  to-consul/source: upsert: key=monitoring/prometheus-kube-prometheus-alertmanager
2023-08-02T19:44:53.007Z [INFO]  to-consul/source: upsert: key=monitoring/prometheus-operated
2023-08-02T19:44:53.010Z [INFO]  to-consul/source: upsert: key=monitoring/alertmanager-operated
2023-08-02T19:44:53.014Z [INFO]  to-consul/source: upsert: key=monitoring/prometheus-kube-prometheus-prometheus
2023-08-02T19:44:53.089Z [INFO]  to-consul/source: upsert: key=monitoring/prometheus-grafana
2023-08-02T19:44:53.194Z [INFO]  to-consul/source: upsert: key=monitoring/prometheus-kube-state-metrics
2023-08-02T19:44:53.396Z [INFO]  to-consul/source: upsert: key=consul/consul-consul-dns
[GET /health/ready] Error getting leader status: Get "http://127.0.0.1:8500/v1/status/leader": dial tcp 127.0.0.1:8500: connect: connection refused
[GET /health/ready] Error getting leader status: Get "http://127.0.0.1:8500/v1/status/leader": dial tcp 127.0.0.1:8500: connect: connection refused
[GET /health/ready] Error getting leader status: Get "http://127.0.0.1:8500/v1/status/leader": dial tcp 127.0.0.1:8500: connect: connection refused
[GET /health/ready] Error getting leader status: Get "http://127.0.0.1:8500/v1/status/leader": dial tcp 127.0.0.1:8500: connect: connection refused

This is my values file content:

global:
  enabled: false
  image: "hashicorp/consul:1.14.0"
  datacenter: opsschool
  gossipEncryption:
    secretName: consul-gossip-encryption-key
    secretKey: key
#externalServers:
#  enabled: true
#  hosts: [consul-consul-dns.consul.svc]
#  httpPort: 8500
server:
  enabled: false
client:
  enabled: true
  exposeGossipPorts: true
  join:
    - "provider=aws tag_key=Consul tag_value=server"
#  nodeMeta:
#    pod-name: ${HOSTNAME}
#    host-ip: ${HOST_IP}
dns:
  enabled: true
connectInject:
  enabled: false

syncCatalog:
  enabled: true
  image: hashicorp/consul-k8s-control-plane:0.49.5 

This is how my configmap looks like:

kubectl describe cm coredns -n kube-system
Name:         coredns
Namespace:    kube-system
Labels:       eks.amazonaws.com/component=coredns
              k8s-app=kube-dns
Annotations:  <none>

Data
====
Corefile:
----
.:53 {
    errors
    health
    kubernetes cluster.local in-addr.arpa ip6.arpa {
      pods insecure
      fallthrough in-addr.arpa ip6.arpa
    }
    prometheus :9153
    forward . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}
consul {
  errors
  cache 30
  forward .  172.20.7.91
}


BinaryData
====

Events:  <none>

Hi @ac171,

There are few things wrong with your values.yaml file. If I understand your set up correctly, your Consul Servers are on VMs (outside K8S), and you are trying to install a Client-Only K8S cluster. Based on this understanding, the following are what needs fixing:

  1. You have to set externalServers when the Servers are outside K8S. From the commented block, you seems to have tried with a K8S DNS name. However, your externalServers.hosts should have the the IPs of our VMs.

  2. The option externalServers.httpPort is wrong, it should be externalServers.httpsPort

After you set the externalServers settings properly and upgrade, your syncCatalog service would start to talk to the external servers and start syncing.

One other thing to note is, from Consul-K8S 1.0, the Consul Client agents are not required on K8S. Unless you have a specific reason, you may want to disable the clients. The clients are not in use from Consul-K8S 1.0.

You can read more about this change here: Consul 1.14 Beta: Announcing Simplified Service Mesh Deployments

Ok thank i will check that.
What about networking side?
I opened specific ports and than i opened all traffic just to check if this is my problem, and i still have connection refused error.
I have 3 consul servers on AWS and want to deploy consul agent on my EKS nodes using helm.
I have configured the consul service ip address in the coredns configmap.
Don’t i need to use the consul service dns name?

Hi @ac171,

I don’t know whether you managed to get this working. Based on the setup you are going for, you can simplify the values file to something as shown below.

global:
  enabled: false
  datacenter: opsschool
connectInject:
  enabled: false
externalServers:
  enabled: true
  hosts: ["Your ec2 Consul VM IP or DNS names that can resolve to those IPs"]
  httpsPort: 8500
dns:
  enabled: true
syncCatalog:
  enabled: true

When you configure externalServers, your syncCatalog pod will start talking directly to the Consul Servers for syncing services.

Please note that the above is not a production-grade config from a security pov. You are not enabling ACLs, TLS that is recommended for production deployments.

Here is a demo of how it looks like: Consul-K8S Catalog Sync with External Servers - asciinema

1 Like

I am not sure why it works for you and not for me :frowning_face:
This is the error i get - i have opened ports 53 and 8500-8502 for both consul servers and my eks nodes:

kubectl logs consul-consul-sync-catalog-55b566fd9b-2qvd2 -n consul
2023-08-22T19:53:04.449Z [INFO]  consul-server-connection-manager: trying to connect to a Consul server
2023-08-22T19:53:04.749Z [INFO]  consul-server-connection-manager: discovered Consul servers: addresses=[172.20.136.155:8502]
2023-08-22T19:53:04.750Z [INFO]  consul-server-connection-manager: current prioritized list of known Consul servers: addresses=[172.20.136.155:8502]
2023-08-22T19:53:14.751Z [ERROR] consul-server-connection-manager: connection error: error="fetching supported dataplane features: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.20.136.155:8502: i/o timeout\""
2023-08-22T19:53:15.199Z [INFO]  consul-server-connection-manager: trying to connect to a Consul server
2023-08-22T19:53:15.203Z [INFO]  consul-server-connection-manager: discovered Consul servers: addresses=[172.20.136.155:8502]
2023-08-22T19:53:15.203Z [INFO]  consul-server-connection-manager: current prioritized list of known Consul servers: addresses=[172.20.136.155:8502]

This is what i have configured in coredns configmap:
consul {
errors
cache 30
forward . $(kubectl get svc -n consul consul-consul-dns -o jsonpath={.spec.clusterIP})
}

This is the result:
kubectl get svc -n consul consul-consul-dns -o jsonpath={.spec.clusterIP}
172.20.136.155

Maybe this can help as well:

kubectl describe deployment.apps/consul-consul-sync-catalog -n consul
Name:                   consul-consul-sync-catalog
Namespace:              consul
CreationTimestamp:      Tue, 22 Aug 2023 22:48:18 +0300
Labels:                 app=consul
                        app.kubernetes.io/managed-by=Helm
                        chart=consul-helm
                        component=sync-catalog
                        heritage=Helm
                        release=consul
Annotations:            deployment.kubernetes.io/revision: 1
                        meta.helm.sh/release-name: consul
                        meta.helm.sh/release-namespace: consul
Selector:               app=consul,chart=consul-helm,component=sync-catalog,release=consul
Replicas:               1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=consul
                    chart=consul-helm
                    component=sync-catalog
                    release=consul
  Annotations:      consul.hashicorp.com/connect-inject: false
  Service Account:  consul-consul-sync-catalog
  Containers:
   sync-catalog:
    Image:      hashicorp/consul-k8s-control-plane:1.2.1
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/sh
      -ec
      consul-k8s-control-plane sync-catalog \
        -log-level=info \
        -log-json=false \
        -k8s-default-sync=true \
        -consul-domain=consul \
        -allow-k8s-namespace="*" \
        -deny-k8s-namespace="kube-system" \
        -deny-k8s-namespace="kube-public" \
        -k8s-write-namespace=${NAMESPACE} \
        -node-port-sync-type=ExternalFirst \
        -consul-node-name=k8s-sync \
        -add-k8s-namespace-suffix \

    Limits:
      cpu:     50m
      memory:  50Mi
    Requests:
      cpu:      50m
      memory:   50Mi
    Liveness:   http-get http://:8080/health/ready delay=30s timeout=5s period=5s #success=1 #failure=3
    Readiness:  http-get http://:8080/health/ready delay=10s timeout=5s period=5s #success=1 #failure=5
    Environment:
      CONSUL_ADDRESSES:    consul-consul-dns.consul.svc
      CONSUL_GRPC_PORT:    8502
      CONSUL_HTTP_PORT:    8501
      CONSUL_DATACENTER:   opsschool
      CONSUL_API_TIMEOUT:  5s
      NAMESPACE:            (v1:metadata.namespace)
    Mounts:                <none>
  Volumes:                 <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    False   ProgressDeadlineExceeded
OldReplicaSets:  <none>
NewReplicaSet:   consul-consul-sync-catalog-55b566fd9b (1/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  20m   deployment-controller  Scaled up replica set consul-consul-sync-catalog-55b566fd9b to 1