Can't bootstrap server remotely with gRPC

Hi!

I’m trying to setup Waypoint in my Kubernetes cluster. I’m using an EKS cluster, and I’ve already configured ECR and the required prerequisites for the installation.

I use Argo as my Gitops tool, and I’ve installed the Waypoint’s manifests by deploying the generated manifests from the waypoint install --platform=kubernetes -accept-tos --show-yaml command.

Everything worked as expected. The pods, services and PVCs started running accordingly, but taking a look at the server logs, I noticed that it required a bootstrap command.

At first, I made a simple port-forward, and the bootstrap executed with success, but since I want to integrate this with other tools, I need to create a route for remote access (without the port-forward).

I’m using Traefik Proxy as my ingress controller, and I’ve created an IngressRoute like the following:

---
kind: IngressRoute
apiVersion: traefik.containo.us/v1alpha1
metadata:
  name: waypoint
  labels:
    app.kubernetes.io/name: waypoint
    app.kubernetes.io/version: 1.0.0
    app.kubernetes.io/component: infrastructure
    app.kubernetes.io/part-of: waypoint
spec:
  entryPoints:
  - websecure
  routes:
  - kind: Rule
    match: Host(`registry.getbud.co`)
    priority: 10
    services:
    - name: waypoint
      port: 9702
      scheme: https
  - kind: Rule
    match: Host(`registry.getbud.co`) && Headers(`Content-Type`, `application/grpc`)
    priority: 11
    services:
    - name: waypoint
      port: 9701

I’ve used the same configs I’m using for ArgoCD, which also has a HTTP interface for a web UI, and a gRPC for CLI.

After that, I’ve removed my PVC, and started a new installation from scratch, to setup and bootstrap the server.

Now, I’m facing two issues:

  1. When I try to acccess the registry URL: https://registry.getbud.co, I receive and Internal Server Error, with the following log:
2020/10/21 01:51:42 http: TLS handshake error from 10.0.3.149:32974: remote error: tls: bad certificate

If I understand correctly, the web UI will only works after my TLS certificate is valid, right? Well, that is a staging environment, so problably after deploying to production it will work. Right?

  1. When I try to interact with the server throught gRPC, I’ve also receive an error, but at this time it is isn’t even routed to the server. For example, if I run:
waypoint server bootstrap -server-addr=registry.getbud.co:443 --server-tls=false

I’ve receive the following output from the CLI:

! failed to create client: rpc error: code = Unimplemented desc = Not Found: HTTP status code 404; transport: received the unexpected content-type "text/plain; charset=utf-8"

Also, no logs appeared at the Waypoint’s server pod. But, at Traefik, the following log appears:

[21/Oct/2020:01:55:11 +0000] "POST /hashicorp.waypoint.Waypoint/GetVersionInfo HTTP/2.0" - - "-" "-" 440 "-" "-" 0ms

I’ve also tried to add the h2c as the scheme to the gRPC service, like the following:

      services:
        - name: waypoint
          port: 9701
          scheme: h2c

But the issue persists. I receive an 404 error and Traefik’s logs are the same.

After a few test scenarios, I got the following (strange) behavior: If I try to bootstrap remotelly, I got a 502 error, but if I bootstrap using localhost (with port-forward) and them try to bootstrap again, it talks with the server with success and returns that the server is already bootstrapped. Take a look at the logs:

platform@budproj/terraform/accounts/root on  feature/waypoint [!] using ☁️ devops@bud at ☸️  v1.18.8-eks-7c9bda bud 
❯ waypoint server bootstrap -server-addr=registry.getbud.co:443 -server-tls-skip-verify
! failed to create client: rpc error: code = Unavailable desc = Bad Gateway: HTTP status code 502; transport:
received the unexpected content-type "text/plain; charset=utf-8"

platform@budproj/terraform/accounts/root on  feature/waypoint [!] using ☁️ devops@bud at ☸️  v1.18.8-eks-7c9bda bud 
❯ waypoint server bootstrap -server-addr=localhost:9701 -server-tls-skip-verify
<token>

platform@budproj/terraform/accounts/root on  feature/waypoint [!] using ☁️ devops@bud at ☸️  v1.18.8-eks-7c9bda bud 
❯ waypoint server bootstrap -server-addr=registry.getbud.co:443 -server-tls-skip-verify
! Error bootstrapping the server: server is already bootstrapped

Any ideas how to fix both issues?

Hi @delucca firstly I have to admit I am not so familiar with Traefik but I will do my best to help.

FIrstly the following error, I am assuming this is coming from the Traefik logs?

2020/10/21 01:51:42 http: TLS handshake error from 10.0.3.149:32974: remote error: tls: bad certificate

My guess here is that since you have defined a scheme: https Traefik is trying to validate the upstream certificate for the Waypoint server. By default, this is using an unsigned certificate, is it possible you can set the route to use insecure?

On the second error:

! failed to create client: rpc error: code = Unimplemented desc = Not Found: HTTP status code 404; transport: received the unexpected content-type "text/plain; charset=utf-8"

Is Traefik stripping any of the headers or other data from the gRPC request?

I am going to try and set up Traefik on my local server so I can help further.

Regarding the final error where you get the bootstrapped message, I think there might be a bug here where it is ignoring the -server-addr flag when you have a valid context. I just tested the following with an invalid address on my local instance and I get the same message. We get a bug fix out for this issue.

➜ waypoint server bootstrap -server-addr=registry.getbud.co:4243 -server-tls-skip-verify
! Error bootstrapping the server: server is already bootstrapped

Hi @nic, thanks for your assistance.

I’m going also to divide my comments based on your reply.

(regarding error 1) My guess here is that since you have defined a scheme: https Traefik is trying to validate the upstream certificate for the Waypoint server. By default, this is using an unsigned certificate, is it possible you can set the route to use insecure?

Exactly, but if I change the scheme to http (or simply remove it) I got the following error in my browser (after accessing the URL)

Client sent an HTTP request to an HTTPS server.

That is the same error that I got if I just do a port-forward and try to access localhost:9702 using http protocol instead of https.

(regarding error 2) Is Traefik stripping any of the headers or other data from the gRPC request?

It should be. I’ve also tried to add the passHotHeader properties at the rule, but it did not solve the issue. It is strange because ArgoCD uses a similar approach (with gRPC in their CLI) but it works as expected.

Thanks, I am guessing that Traefik is not transparently proxying the requests which is why those errors. I will need to dig into the configuration.

Hi @delucca,

I think my suspicions were correct regarding the way Traefik is trying to connect to upstream services with TLS certificates.

With IngressRoute in Traefik I can’t get this to work correctly, I suspect this is due to the self signed certificate on the Waypoint server and I can not figure out how to tell Traefik to ignore this certificate.

I have however managed to get his working with IngressRouteTCP and SNI headers.

---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
metadata:
  name: waypoint
spec:
  routes:
  - kind: Rule
    match: HostSNI(`waypoint.web.localhost`)
    services:
    - kind: Service
      name: waypoint
      namespace: default
      port: 9702
  - kind: Rule
    match: HostSNI(`waypoint.grpc.localhost`)
    services:
    - kind: Service
      name: waypoint
      namespace: default
      port: 9701
  tls:
    passthrough: true

You can test this with curl by forcing it to send the SNI header

curl -kiv --resolve waypoint.web.localhost:9080:127.0.0.1 https://waypoint.web.localhost:9080/

If you have a resolvable DNS this should work without the need to set --resolve

Kind regards,

Nic

Hi @nic,

Thanks for your help! I’m going to try using this suggested manifest and check if it works.

I’m let you know as soon as I finish the tests in my Kubernetes cluster.

@nic

Don’t know if if wasn’t able to understand how to do it correctly, but it did not work (at least in my Kubernetes cluster).

Here is what I did. I’ve create a new Kubernetes resource, at the same namespace where Waypoint server is installed. Here is the full manifest:

---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
metadata:
  name: waypoint
  namespace: waypoint
spec:
  routes:
  - kind: Rule
    match: HostSNI(`waypoint.infrastructure.getbud.co`)
    services:
    - kind: Service
      name: waypoint
      namespace: waypoint
      port: 9702
  - kind: Rule
    match: HostSNI(`waypoint.infrastructure.getbud.co`)
    services:
    - kind: Service
      name: waypoint
      namespace: waypoint
      port: 9701
  tls:
    passthrough: true

Note: Waypoint server is located at waypoint namespace.

After that, I created a new Waypoint context with the following CLI command:

waypoint context create -server-addr=waypoint.infrastructure.getbud.co -server-auth-token=<secret> -server-tls-skip-verify bud-new

Them, I’ve changed the current context with:

waypoint context use bud-new

And ran:

waypoint context verify

And the following error appeared:

❌ Connecting with context "bud-new"...
! Error connecting with context "bud-new": context deadline exceeded

Maybe I need to install something else at my Kubernetes cluster? Sorry, I’m not too familiar with networking, also I have no idea what a resolvable DNS is.

So with your example what you have is the same HostSNI which for both the HTTP port and the gRPC port.

Traefik is going to accept traffic on the web port and look at the SNI to determine which upstream to route traffic to. In your config since you have two Rules with the same HostSNI, Traefik will probably use the first Rule which is the HTTP UI.

If you change the rule to something like:

- kind: Rule
    match: HostSNI(`waypoint.grpc.getbud.co`)
    services:
    - kind: Service
      name: waypoint
      namespace: waypoint
      port: 9701

Then run:

waypoint context create -server-addr=waypoint.grpc.getbud.co -server-auth-token=<secret> -server-tls-skip-verify bud-new

I think this should work for you.

Hi @nic

Thanks again for your time helping me. But, I wasn’t able to make it work =/

Here is my IngressRoute:

---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
metadata:
  name: waypoint
spec:
  routes:
  - kind: Rule
    match: HostSNI(`dashboard.waypoint.infrastructure.getbud.co`)
    services:
    - kind: Service
      name: waypoint
      namespace: waypoint
      port: 9702
  - kind: Rule
    match: HostSNI(`waypoint.infrastructure.getbud.co`) 
    services:
    - kind: Service
      name: waypoint
      namespace: waypoint
      port: 9701
  tls:
    passthrough: true

(as you may see, I’m using dashboard. to HTTP and waypont. to grpc)

But when I try to run waypoint context verify the same error happens (`context deadline exceeded.

Any idea how to fix it?

Quick note: I’m using web to websecure plugin in my Traefik, maybe this can be related with the issue?