Issue deploying Consul Catalog Sync with Agent Running on the Node

I am currently upgrading from Consul Helm Chart v0.49.6 to v1.0.0.

During this upgrade I noticed the following changes

-       - name: consul-data
-         emptyDir:
-           medium: "Memory"
        containers:
          - name: sync-catalog
-           image: "hashicorp/consul-k8s-control-plane:0.49.6"
+           image: "hashicorp/consul-k8s-control-plane:1.0.0"
            env:
-             - name: HOST_IP
-               valueFrom:
-                 fieldRef:
-                   fieldPath: status.hostIP
+             - name: CONSUL_ADDRESSES
+               value: consul-catalog-sync-consul-server.infra-system.svc
+             - name: CONSUL_GRPC_PORT
+               value: "8502"
+             - name: CONSUL_HTTP_PORT
+               value: "8500"
+             - name: CONSUL_DATACENTER
+               value: dc1
+             - name: CONSUL_API_TIMEOUT
+               value: 5s
              - name: NAMESPACE
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.namespace
-             - name: CONSUL_HTTP_ADDR
-               value: http://$(HOST_IP):8500
            volumeMounts:
-             - mountPath: /consul/login
-               name: consul-data
-               readOnly: true

Which shows that the CONSUL_ADDRESS is being modified to use the consul-server service by default and not the host_ip.

As a result my deployment is failing to connect to the consul agent. I checked the release notes and documentation for Consul Helm but did not find anything that allows me to update the environment variables in Catalog Sync.

I also tried deploying a server service but it clearly stated that NODE_IP should be used instead of the service:

+ # Source: consul/templates/server-service.yaml
+ # Headless service for Consul server DNS entries. This service should only
+ # point to Consul servers. For access to an agent, one should assume that
+ # the agent is installed locally on the node and the NODE_IP should be used.
+ # If the node can't run a Consul agent, then this service can be used to
+ # communicate directly to a server agent.

Please help!

Hi @archit-khanna,

Welcome to HashiCorp Forums!

Consul-K8S 1.0.0 introduces Consul Dataplane, and Consul Clients are no longer deployed by default. You can find these changes listed under Breaking Changes in the release notes.

Ref:

Please include the errors and the steps you took so that it will be easier to understand what is going wrong and offer help.

Hi @Ranjandas

I was running 0.49.6 with no issues and all I did was update the chart version to 1.0.0. To make sure no new features are enabled I also updated my Values.yaml as follows

values:
  - global:
      enabled: false
  - syncCatalog:
      enabled: true
      default: false
      toConsul: true
      toK8S: false
      k8sTag: {{ .Environment.Name }}
      k8sDenyNamespaces: ["kube-public"]
      addK8SNamespaceSuffix: false
  - connectInject:
      enabled: false
  - dns:
      enabled: false

I am installing the Consul agent manually on the nodes so there is no need to enable the client or install the agent through the chart.

On applying the new chart I get the following error from the pods:

consul-server-connection-manager: trying to connect to a Consul server
2023-05-25T12:45:09.254Z [ERROR] consul-server-connection-manager: connection error: error="failed to discover Consul server addresses: failed to resolve DNS name: consul-catalog-sync-consul-server.infra-system.svc: lookup consul-catalog-sync-consul-server.infra-system.svc on 172.20.0.10:53: no such host"
2023-05-25T12:45:09.649Z [INFO]  consul-server-connection-manager: trying to connect to a Consul server
2023-05-25T12:45:09.661Z [ERROR] consul-server-connection-manager: connection error: error="failed to discover Consul server addresses: failed to resolve DNS name: consul-catalog-sync-consul-server.infra-system.svc: lookup consul-catalog-sync-consul-server.infra-system.svc on 172.20.0.10:53: no such host"
2023-05-25T12:45:10.468Z [INFO]  consul-server-connection-manager: trying to connect to a Consul server
2023-05-25T12:45:10.474Z [ERROR] consul-server-connection-manager: connection error: error="failed to discover Consul server addresses: failed to resolve DNS name: consul-catalog-sync-consul-server.infra-system.svc: lookup consul-catalog-sync-consul-server.infra-system.svc on 172.20.0.10:53: no such host"

Hi @archit-khanna,

This is interesting; why would you not use the Helm option to install Consul Clients?

Regardless, this scenario is similar to having servers external to Kubernetes, and you can use the externalServers Helm option so that the syncCatalog pods would use them.

ref: Helm Chart Reference | Consul | HashiCorp Developer

For example, adding the following and running Helm Upgrade would fix the issue for you.

externalServers:
  enabled: true
  hosts:
    - <IP Address of one of your Consul agents>
  httpsPort: 8500 # if your Consul agents are running HTTP

Please try this and let me know how it goes.

Hi @Ranjandas, that’s a good suggestion. I will try this out and update here accordingly. Thank you!

@archit-khanna Can you please check and let me know if you are facing the same issue

I believe Consul ui is not showing the details of the node in which the pod is running post-upgrade.

Is this behaviour part of agent removal?

Hi Magesh, I am not having the same issue.