Elastic Search in Kubernetes with Consul

I’m trying to connect Elastic Search in kubernetes with Consul. If I install everything without Consul, it all works as expected. If I try to install with the Consul service mesh enabled, I am unable to connect to the elastic search server from Filebeat. I can connect from Kibana following the setup mentioned here: Consul Connect and ECK · Issue #2973 · elastic/cloud-on-k8s · GitHub under " Associating through the mesh".

All services are installed using helm templates

Interestingly Kibana seems to connect OK to Elastic, but as FileBeat is outside of the mesh, there seems to be no way to get it to connect. There also seems to be problems checking the license.

I’m seeing the following error in the elastic search deployment:

Get "http://elastic-search-es-internal-http.elastic-system.svc:9200/_license": EOF

And the following error from FileBeat (whatever url I’ve tried gets the same error):

Failed to connect to backoff(elasticsearch(http://elastic-search-es-http.elastic-system.svc:9200)): Get "http://elastic-search-es-http.elastic-system.svc:9200": EOF

I’ve also seen that if I try to setup Elastic with transparent proxy in the same way as I have for Kibana, but it won’t deploy at all (I can’t see any useful error messages for this)

consul.hashicorp.com/transparent-proxy: "true"
consul.hashicorp.com/transparent-proxy-exclude-inbound-ports: 9200

I’m sure I have missed something simple here, but if anyone has any thoughts, please let me know. I’ve spent almost a week here trying every option I can think of and got nowhere.

My configuration for Consul is as follows:

global:
  name: consul
  datacenter: dc1
  metrics:
    enabled: true
  tls:
    enabled: true
    enableAutoEncrypt: true
    verify: true
    serverAdditionalDNSSANs:
      ## Add the K8s domain name to the consul server certificate
      - "consul-server.consul-system.svc.cluster.local"
  ## For production turn on ACLs and gossipEncryption:
  # acls:
  #   manageSystemACLs: true
  # gossipEncryption:
  #   secretName: "consul-gossip-encryption-key"
  #   secretKey: "key"
server:
  replicas: 1
  securityContext:
    runAsNonRoot: false
    runAsUser: 0
ui:
  enabled: true
connectInject:
  # This method will inject the sidecar container into Pods:
  enabled: true
  # But not by default, only do this for Pods that have the explicit annotation:
  #        consul.hashicorp.com/connect-inject: "true"
  default: false
controller:
  enabled: true
prometheus:
  enabled: true
grafana:
  enabled: true

syncCatalog:
  # This method will automatically synchronize Kubernetes services to Consul:
  # (No sidecar is injected by this method):
  enabled: true
  # But not by default, only for Services that have the explicit annotation:
  #        consul.hashicorp.com/service-sync: "true"
  default: false
  # Synchronize from Kubernetes to Consul:
  toConsul: true
  # But not from Consul to K8s:
  toK8S: true

Elastic:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elastic-search
spec:
  version: {{ .Values.elastic.version }}
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  nodeSets:
    - name: default
      count: 1
      config:
        node.store.allow_mmap: false
      podTemplate:
        metadata:
          annotations:
            consul.hashicorp.com/connect-service: "elastic-search"
            consul.hashicorp.com/connect-inject: "true"
            consul.hashicorp.com/connect-service-port: "http"
        spec:
          automountServiceAccountToken: true
          serviceAccount: elastic-search

Kibana:

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana
spec:
  version: {{ .Values.elastic.version }}
  count: 1
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  config:
    elasticsearch.hosts:
      - http://127.0.0.1:9200
    elasticsearch.username: elastic
    elasticsearch.ssl.verificationMode: none
  podTemplate:
    metadata:
      annotations:
        consul.hashicorp.com/connect-service: "kibana"
        consul.hashicorp.com/connect-inject: "true"
        consul.hashicorp.com/connect-service-port: "http"
        consul.hashicorp.com/connect-service-upstreams: "elastic-search:9200"
        consul.hashicorp.com/transparent-proxy: "true"
        consul.hashicorp.com/transparent-proxy-exclude-inbound-ports: 5601,80,443

    spec:
      automountServiceAccountToken: true
      serviceAccount: kibana
      containers:
        - name: kibana
          env:
            - name: ELASTICSEARCH_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: elastic-search-es-elastic-user
                  key: elastic

Beats:

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: filebeat
spec:
  type: filebeat
  version: {{ .Values.elastic.version }}
  elasticsearchRef:
    name: elastic-search
  config:
    output.elasticsearch:
      hosts: ["http://127.0.0.1:9200", "http://elastic-search-es-http:9200"]
      username: "elastic"
      password: "924y07bumdibu20y1JP7b4iI" # "${ELASTICSEARCH_PASSWORD}"
      ssl.verificationMode: none

    filebeat.inputs:
      - type: container
        paths:
          - /var/log/containers/*.log
        multiline.pattern: '^{'
        multiline.negate: true
        multiline.match: after

    processors:
      # flattens the array to a single string
      - script:
          when:
            has_fields: ['error.stack_trace']
          lang: javascript
          id: my_filter
          source: >
            function process(event) {
              event.Put("error.stack_trace", event.Get("error.stack_trace").join("\n"));
            }
      - decode_json_fields:
          fields: ["message"]
          target: ""
          process_array: true
          max_depth: 10
          overwrite_keys: true
      - add_kubernetes_metadata:
          in_cluster: true
  daemonSet:
    podTemplate:
      metadata:
        annotations:
          consul.hashicorp.com/connect-service: "filebeat"
          consul.hashicorp.com/connect-service-upstreams: "elastic-search:9200"

      spec:
        automountServiceAccountToken: true
        serviceAccount: filebeat
        dnsPolicy: ClusterFirstWithHostNet
        hostNetwork: true
        securityContext:
          runAsUser: 0
        containers:
          - name: filebeat
            env:
              - name: ELASTICSEARCH_PASSWORD
                valueFrom:
                  secretKeyRef:
                    name: elastic-search-es-elastic-user
                    key: elastic
            volumeMounts:
              - name: varlogcontainers
                mountPath: /var/log/containers
              - name: varlogpods
                mountPath: /var/log/pods
              - name: varlibdockercontainers
                mountPath: /var/lib/docker/containers
        volumes:
          - name: varlogcontainers
            hostPath:
              path: /var/log/containers
          - name: varlogpods
            hostPath:
              path: /var/log/pods
          - name: varlibdockercontainers
            hostPath:
              path: /var/lib/docker/containers

Hi codex, so filebeat isn’t in the service mesh?

We don’t support non-service-mesh => service-mesh connections without going through a Consul ingress or API gateway.

One workaround would be to expose the service-mesh app’s ports (skipping the sidecar proxy) and then non-service-mesh apps could talk to it (but this would bypass the service mesh security, metrics etc.).

I think you were trying this workaround with the transparent-proxy-exclude-inbound-ports: 9200 annotation. That’s the right way to do it, then filebeat could talk to ES on 9200. If that’s the workaround you want to go with (instead of exposing ES through an ingress/api gateway) then I think we should try and figure out why that’s not deploying.

  1. What is the output of kubectl get pods when you use that annotation.
  2. What is the output of kubectl describe pod <es>
  3. If the pod is running what are the logs of the init containers?
  4. What are the logs of the connect-inject deployment around the time you deploy ES.

Filebeat isn’t in the service mesh, I don’t see any way to add it as it doesn’t have a service. I’m also going to need access to ElasticSearch from outside the mesh.

I’m guessing an ingress or api gateway would be the best approach. I’ve been using Nginx for ingress, this works for Kibana, but I would need to expose the port (which for some reason I don’t seem to be able to).

Nothing actually seems to deploy at all when I add that annotation, it’s hard to tell what’s happening as ElasticSearch has an unusual way of deploying inside kubernetes, it uses an operator pattern for deployments.

That means I have no logs to share when the annotation is added, the only interesting thing I saw way in the injector logs:

http: TLS handshake error from 10.13.2.128:53880: EOF

Is it even possible for me to set up the API gateway or any ingress without exposing the port with the annotations?

You can actually put nginx into the service mesh (Ingress Controller Integrations | Consul by HashiCorp) and then use that as your ingress.

Is it even possible for me to set up the API gateway or any ingress without exposing the port with the annotations?

Yes, with api gateway or ingress the request that comes into the gateway is non-service-mesh (e.g filebeat) but the request it makes out is over the service mesh. This means anything behind the gateway (e.g. ES) doesn’t need that port exposed anymore.

Brilliant, and thanks for your help on this. I will look into the API gateway method as that’s something we’re going to need for our other services, but can probably get the ingress set up fairly easily. I was really hoping that we could use beats with service discovery because they are inside the cluster.

One last question (for the moment :wink:), is there any easy way for pods which don’t have a service (like beats) to access the API gateway or an ingress using something like the dns entries inside kubernetes. We will have at least 3 clusters all needing to connect to the same Elasticsearch instance. Ideally I don’t want to set up external DNS entries.

The api gateway and ingress will have Kube services so you can use Kube DNS to access them from the same kube cluster.

From outside the cluster… does elasticsearch have a loadbalancer service? You might be able to use our catalog sync to sync the LB to consul and then use consul DNS. Getting complicated though.

Filebeat isn’t in the service mesh, I don’t see any way to add it as it doesn’t have a service.

Just saw this again, you could always create a service yourself for filebeat?

Brilliant and once again thank you for all your help. I will try all of these options over the weekend and report back.

Interestingly, I created a service for filebeat, as this would be the ideal solution. Unfortunately as soon as I added the annotation to inject the filebeat pod into the Consul service mesh, it destroyed the entire kubernetes cluster.

It appeared to start with the kube coredns pods, but quickly spread to many other pods. Luckily this is a proof of concept, so no problem I just need to rebuild the cluster. I’ve tried it twice and both times the same result.

Unfortunately, with most of the pods destroyed, it was difficult searching the logs, I’ve not seen anything useful so far, but will try again and see what I can find.

OK, this is the most relevant information I have from the coredns pods:

Events:
  Type     Reason                  Age                  From               Message
  ----     ------                  ----                 ----               -------
  Warning  FailedScheduling        59m (x6 over 64m)    default-scheduler  no nodes available to schedule pods
  Warning  FailedScheduling        58m                  default-scheduler  0/1 nodes are available: 1 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.
  Warning  FailedScheduling        58m                  default-scheduler  0/2 nodes are available: 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
  Normal   Scheduled               58m                  default-scheduler  Successfully assigned kube-system/coredns-866b7cfc75-wkrg4 to 10.12.109.125
  Warning  FailedCreatePodSandBox  58m                  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_coredns-866b7cfc75-wkrg4_kube-system_cff1b7d2-ecf1-49c2-8fc3-51b98924bc29_0(4ee34238dbfe0691f18d9237a58fa8fb57fc8954dafa1822a62a27656a2cef35): error adding pod kube-system_coredns-866b7cfc75-wkrg4 to CNI network "crio": open /run/flannel/subnet.env: no such file or directory
  Normal   Pulling                 57m                  kubelet            Pulling image "eu-frankfurt-1.ocir.io/axoxdievda5j/oke-public-coredns@sha256:014bc7d0b5b45b85fec12cc9ff7d0042afe7e2d1ae09ca12531f4dc1cbef3013"
  Normal   Pulled                  57m                  kubelet            Successfully pulled image "eu-frankfurt-1.ocir.io/axoxdievda5j/oke-public-coredns@sha256:014bc7d0b5b45b85fec12cc9ff7d0042afe7e2d1ae09ca12531f4dc1cbef3013" in 5.910172492s
  Normal   Killing                 43m                  kubelet            Container coredns failed liveness probe, will be restarted
  Normal   Created                 43m (x2 over 57m)    kubelet            Created container coredns
  Normal   Pulled                  43m                  kubelet            Container image "eu-frankfurt-1.ocir.io/axoxdievda5j/oke-public-coredns@sha256:014bc7d0b5b45b85fec12cc9ff7d0042afe7e2d1ae09ca12531f4dc1cbef3013" already present on machine
  Normal   Started                 43m (x2 over 57m)    kubelet            Started container coredns
  Warning  Unhealthy               13m (x123 over 43m)  kubelet            Readiness probe failed: Get "http://10.13.0.130:8181/ready": dial tcp 10.13.0.130:8181: connect: connection refused
  Warning  BackOff                 8m1s (x74 over 32m)  kubelet            Back-off restarting failed container
  Warning  Unhealthy               3m8s (x64 over 43m)  kubelet            Liveness probe failed: Get "http://10.13.0.130:8080/health": dial tcp 10.13.0.130:8080: connect: connection refused

Most of the other errors I’m seeing for the other apps look like connection refused.