Consul Gateway API & Cert-Manager (LetsEncrypt) with HTTP01-challenge

I have been trying to get the HTTP01-challenge working with Consul API Gateway & Cert-Manager on Kubernetes, but I am unable to get this to work (or even figure out if it is possible). Anyone have a working example or some insight to share?

I do the following:

  1. Create a Gateway
  2. Create a Deployment (Connect-Inject)
  3. Create HTTPRoute
  4. Create a Certificate Issuer (LetsEncrypt, using http01 solver)
  5. Create/request a Certificate for Gateway TLS

Everything seems to work as expected. A new service for handling the ACME-Challenge is created, and a new HTTPRoute linking the Gateway to this service is created as well.

However, I am not able to see a way for me to include this “created” service into the Consul catalog, and I get an error from the gateway controller:

Gateway Controller Error

[ERROR] service/resolver.go:272: consul-api-gateway-server.k8s.Reconciler: could not resolve consul service: error="consul service default/cm-acme-http-solver-b747j not found"

…this service exists/gets created, but is not inside consul catalog/mesh.

The DNS/gateway works fine with the “demo” deployment on port 80

HTTPRoute (generated; invalid state and cannot bind)

apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  creationTimestamp: "2023-05-31T10:32:41Z"
  generateName: cm-acme-http-solver
  generation: 1
  labels:
    acme.cert-manager.io/http-domain: "1028654758"
    acme.cert-manager.io/http-token: "1119030919"
    acme.cert-manager.io/http01-solver: "true"
  name: cm-acme-http-solver87wdp
  namespace: default
  ownerReferences:
  - apiVersion: acme.cert-manager.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: Challenge
    name: gateway-cfk6j-2783327538-4274227894
    uid: 9e70a9e3-e836-44b9-9377-f0e57087d2a2
  resourceVersion: "45993"
  uid: 14653f31-92cb-401b-b781-bac2fb6e0fcb
spec:
  hostnames:
  - <REDACTED>
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: example-gateway
  rules:
  - backendRefs:
    - group: ""
      kind: Service
      name: cm-acme-http-solver-b747j
      namespace: default
      port: 8089
      weight: 1
    matches:
    - path:
        type: Exact
        value: /.well-known/acme-challenge/xxxxxxx
status:
  parents:
  - conditions:
    - lastTransitionTime: "2023-05-31T10:33:11Z"
      message: route is in an invalid state and cannot bind
      observedGeneration: 1
      reason: BindError
      status: "False"
      type: Accepted
    - lastTransitionTime: "2023-05-31T10:33:11Z"
      message: consul service default/cm-acme-http-solver-b747j not found
      observedGeneration: 1
      reason: BackendNotFound
      status: "False"
      type: ResolvedRefs
    controllerName: hashicorp.com/consul-api-gateway-controller
    parentRef:
      group: gateway.networking.k8s.io
      kind: Gateway
      name: example-gateway

Config Files

gateway.yaml

apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: example-gateway
  #annotations: # https://cert-manager.io/docs/usage/ingress/#supported-annotations
  #  cert-manager.io/issuer: example-issuer
spec:
  gatewayClassName: consul-api-gateway
  listeners:
  - name: web
    port: 80
    protocol: HTTP
    allowedRoutes:
      namespaces:
        from: All
  - name: web-secure
    hostname: <REDACTED>
    port: 443
    protocol: HTTPS
    allowedRoutes:
      namespaces:
        from: All
    tls:
      mode: Terminate
      certificateRefs:
      - name: gateway-example-certificate # Issued from LetsEncrypt
        kind: Secret
        group: ""

issuer.yaml

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: example-issuer
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: acme-issuer-key
    email: <REDACTED>
    solvers:
    - http01:
        gatewayHTTPRoute:
          parentRefs:
          - name: example-gateway # Gateway used for HTTP01 requests
            kind: Gateway

certificate.yaml

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: gateway
spec:
  secretName: gateway-example-certificate # Gateway references this
  issuerRef:
    name: example-issuer
    kind: Issuer
    group: cert-manager.io
  dnsNames:
  - <REDACTED>

example-route.yaml

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: HTTPRoute
metadata:
  name: example-route
spec:
  parentRefs:
  - name: example-gateway # Gateway to route from
  hostnames: []
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /echo
    backendRefs:
    - kind: Service
      name: echo
      port: 8080

Reference documentation:

Application Versions:

  • Gateway CRDs: consul-api-gateway/config/crd?ref=v0.5.4
  • Kubernetes (Aks): 1.25.6
  • Consul (w/Helm): v1.15.2
  • Cert-Manager (w/Helm): v1.12.1

When you say

Can you clarify what you mean by that? Is this the deployment of Consul?

Also

Do you have the config for this service?

This may be related to a known issue: known issue

The “Deployment” in this case is just a basic “Echo” webserver used as the endpoint for testing.

auto-generated-service.yaml

apiVersion: v1
kind: Service
metadata:
  annotations:
    auth.istio.io/8089: NONE
  creationTimestamp: "2023-06-01T06:31:03Z"
  generateName: cm-acme-http-solver-
  labels:
    acme.cert-manager.io/http-domain: "1028654758"
    acme.cert-manager.io/http-token: "1026428499"
    acme.cert-manager.io/http01-solver: "true"
  name: cm-acme-http-solver-4vr78
  namespace: default
  ownerReferences:
  - apiVersion: acme.cert-manager.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: Challenge
    name: gateway-sg7qm-2783327538-2680663985
    uid: 209135fa-171a-4167-9306-0f8a81aca214
  resourceVersion: "14809"
  uid: 7bf16f55-22cf-4c77-a243-5e8d99112159
spec:
  clusterIP: 10.0.129.19
  clusterIPs:
  - 10.0.129.19
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    nodePort: 31821
    port: 8089
    protocol: TCP
    targetPort: 8089
  selector:
    acme.cert-manager.io/http-domain: "1028654758"
    acme.cert-manager.io/http-token: "1026428499"
    acme.cert-manager.io/http01-solver: "true"
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}

auto-generated-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  annotations:
    sidecar.istio.io/inject: "false"
  creationTimestamp: "2023-06-01T06:31:03Z"
  generateName: cm-acme-http-solver-
  labels:
    acme.cert-manager.io/http-domain: "1028654758"
    acme.cert-manager.io/http-token: "1026428499"
    acme.cert-manager.io/http01-solver: "true"
  name: cm-acme-http-solver-k996f
  namespace: default
  ownerReferences:
  - apiVersion: acme.cert-manager.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: Challenge
    name: gateway-sg7qm-2783327538-2680663985
    uid: 209135fa-171a-4167-9306-0f8a81aca214
  resourceVersion: "14836"
  uid: 1b314dd3-8bed-46b2-befb-16ec7ab808da
spec:
  automountServiceAccountToken: false
  containers:
  - args:
    - --listen-port=8089
    - --domain=<REDACTED>
    - --token=fFvbwBhF_6TSV9_meGg493NGp6K4vxmqm3BP7ycbj9M
    - --key=fFvbwBhF_6TSV9_meGg493NGp6K4vxmqm3BP7ycbj9M.0yH3bkxCkS5Ziiwg8OlAtFaxKaQ4RigDscf7H2MDX6U
    image: quay.io/jetstack/cert-manager-acmesolver:v1.12.1
    imagePullPolicy: IfNotPresent
    name: acmesolver
    ports:
    - containerPort: 8089
      name: http
      protocol: TCP
    resources:
      limits:
        cpu: 100m
        memory: 64Mi
      requests:
        cpu: 10m
        memory: 64Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: aks-agentpool-41738809-vmss000000
  nodeSelector:
    kubernetes.io/os: linux
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: OnFailure
  schedulerName: default-scheduler
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-06-01T06:31:03Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-06-01T06:31:06Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-06-01T06:31:06Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-06-01T06:31:03Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://d8c35b8c4f72027386885e8cb3ac84cb9c603177c1bafd43b248ca76b5e307b9
    image: quay.io/jetstack/cert-manager-acmesolver:v1.12.1
    imageID: quay.io/jetstack/cert-manager-acmesolver@sha256:65bd34918063ae36550a7351e458aff5589463605bc2db08f3043ca7017ed30d
    lastState: {}
    name: acmesolver
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2023-06-01T06:31:05Z"
  hostIP: 10.224.0.4
  phase: Running
  podIP: 10.244.0.41
  podIPs:
  - ip: 10.244.0.41
  qosClass: Burstable
  startTime: "2023-06-01T06:31:03Z"

As far as I can read from the referenced issue, the error-message is the same, but case is a bit different.

I don’t think I am able to route to this service, since it’s not created as part of the consul-connect mesh (annotation missing from pod: "'consul.hashicorp.com/connect-inject': 'true'").

I should be able to set this, I guess, so I will have a look at the documentation again, but as far as I can see from the CertManager API Spec, I can’t set custom annotations on the pod.

Can you set the annotation with the solution here?

No; still not able to get it working.

So I was able to repeat the steps above, and saw the same error.
This error is caused because the annotation consul.hashicorp.com/connect-inject: "true" is missing in the service that is created (as you mentioned in a comment).

There are two possible solutions:

Cert-manager needs to enable pass-through of annotations for gatewayHTTPRoute

It seems that cert-manager allows the pass-through of annotations for their HTTP-01 Ingress Solver (HTTP01 - cert-manager Documentation)
It might be worth your time to put in a feature request to allow this on their gateways too.
We did try to see if we could hack it, we realized that the object used for this internally gets merged into the pod, as shown here. However they do not allow both an ingress and gatewayHTTPRoute solver on an issuer, so that didn’t work.

### This Issuer won't work because it contains both `ingress` and `gatewayHTTPRoute` types
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: example-issuer
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: acme-issuer-key
    email: test@yahoo.com
    solvers:
      - http01:
      # not valid to have both ingress and gatewayHTTPRoute
#          ingress:
#            ingressTemplate:
#              metadata:
#                annotations:
#                  "consul.hashicorp.com/connect-inject": "true"
          gatewayHTTPRoute:
            labels:
              name: example-gateway2
            parentRefs:
              - name: example-gateway # Gateway used for HTTP01 requests
                kind: Gateway

Enable connect-inject by default in Consul (not recommend)

You can use this option to enable connect-inject by default.

This will make sure that every service in the cluster has a connect-inject container created as a sidecar. This is not ideal, however, as you may not want every service registered with Consul. You can specifically turn this feature off per deployment, but this will not work out of the box with cert-manager. They would need to add a way for the user to add annotations to
their main deployments:

➜  ~ kubectl get pods -n cert-manager
NAME                                       READY   STATUS            RESTARTS   AGE
cert-manager-7bd65658bc-47hrf              0/2     PodInitializing   0          44s
cert-manager-cainjector-867cf7f7cc-khbdh   0/2     Init:0/1          0          44s
cert-manager-startupapicheck-vgrd6         0/2     Init:0/1          0          44s
cert-manager-webhook-55cf6bbd97-rfsd6      0/2     PodInitializing   0          44s

The annotation would be:
consul.hashicorp.com/connect-inject: "false"

Also the above annotation would need to be manually added to any deployment in the cluster that you did not want registered with Consul.

Hello, and thank you for the response.

I have not been able to try this out any further due to time constraints, but I will hopefully be able to pick it up again at a later time. Probably by creating a pull request to cert-manager repository.

Unfortunately, enabling connect-inject by default is not an option (and I think this also didn’t work, due to service-account requirements, but I’m not 100% sure I remember correctly. I do believe I tried this, though.)

I chased this thread also. It’s more than enabling ‘connect-inject’, I gave it a day of exploration and here are the things I found, (I have ACLs enabled)
Instead of creating randomly named POD and Services you need to:
Change CertManager to:

  1. Create a service account with a random suffix that matches the service name (and not use generateName on the Service and the pod.
  2. add the service account to the pod
  3. Create a service defaults to set type to http (maybe optional)
  4. enable connect inject on the pod
  5. create service intentions to allow gateway to talk to solver pod.

At this point I gave up on possibly creating a pull request for CM. I think it would probably be easier to either have consul API gateway write a static-ish http01 solver and have it present an API that allows the challenges to be loaded into that solver. This would mean not using cert-manager, which does a lot of good for looking at cert expirations etc.
Another answer would be to allow Consul API gateway to serve up non-consul services so that what cert manager is doing is more compatible with Consul. I looked at a terminating gateway, but it made my head spin and since the cert manager services all come up with random names it just seemed like too much of a task.

Following up on what I looked at for Terminating Gateways:
From my reading a terminating gateway will add the envoy ‘sidecar’ proxy to an existing service that is registered already in consul.
The Certificate Manager creates a pod and a service with random suffixes in the name (sine maybe there might be multiple certificates in play at once). It then adds an httproute to this randomly named service.
So how do I register with consul, an ephemeral service created by certificate manager? From my kubernetes yaml I need a consul service name for the terminating endpoint.

A more basic question say I have service xxx on a known DNS name and port? lets say it’s not even kubernetes, like a Postgres Azure Service. How do I mesh that in?

So I think what is really needed here is for Consul API gateway to have an ‘out’ to allow an http path that is just a kubernetes service and not consul.