Boundary worker not connecting and getting odd message in log

Hi

I’m trying to get an on-prem worker KMS to talk to my boundary cluster. I’ve installed boundary on the worker and given it the ALB address of the controller cluster. When I start it up instead of getting a message to say it can’t connect to the address of the ALB I get an unexpected:

{“id”:“TnbcSF5ZXh”,“source”:“https://hashicorp.com/boundary/********/worker",“specversion”:“1.0”,“type”:“error”,“data”:{“error”:"unable to write connection nonce: tls: first record does not look like a TLS handshake”,“error_fields”:{},“id”:“e_EnE2BCb5dJ”,“version”:“v0.1”,“op”:“worker.(Worker).upstreamDialerFunc”},“datacontentype”:“application/cloudevents”,“time”:“2023-04-03T12:39:09.09695831Z”}

Surely it shouldn’t be trying to get to hashicorp.com?

If I try and set the worker up as a PKI worker in the Web Console when I start the worker I get a different error:

{“id”:“w0JjDqrnuV”,“source”:“https://hashicorp.com/boundary/******/worker",“specversion”:“1.0”,“type”:“error”,“data”:{“error”:"(nodeenrollment.protocol.attemptFetch) error base64 decoding fetch response: illegal base64 data at input byte 3”,“error_fields”:{},“id”:“e_LfwwqzfO0l”,“version”:“v0.1”,“op”:“worker.(Worker).upstreamDialerFunc”},“datacontentype”:"application

Anyone able to help please?

Cheers

Riddle

Nope. It’s the same everywhere. Not sure what they meant to do by adding that https://hashicorp.com/boundary/ prefix, maybe they are planning to allow other types of workers in the future…? My logs show the same thing.

Can you post your workers’ and controllers’ configuration files (with sensitive data redacted)?

The source comes from CloudEvents: spec/spec.md at v1.0.2 · cloudevents/spec · GitHub

As for your issue, KMS workers cannot talk to HCP Boundary clusters at this time, only PKI workers are supported…apologies! We do want to eventually enable this.

To be clear: please do include configuration information so we can help with the PKI worker. I was simply answering about the KMS one prior to getting more detail :slight_smile:

Ah, failed to notice the cluster is on HCP

The cluster is OSS. I will post my config shortly.

PKI Config:

listener "tcp" {
  address = "0.0.0.0:9202"
  purpose = "proxy"
}

worker {
  public_addr = "*****"
  auth_storage_path = "/etc/boundary.d/worker1"
  tags {
    site = ["blah"]
  }
  initial_upstreams = ["*******"]
}

KMS Config

disable_mlock = true

kms "awskms" {
  kms_key_id = "************"
  purpose    = "worker-auth"
}


listener "tcp" {
  address     = "*.*.*.*:9202"
  purpose     = "proxy"
  tls_disable = true
}

worker {
#  controllers = ["*******"]
  initial_upstreams = ["*********"]
  name        = "*******"
  public_addr = "*******"
}

Controller config:

controller {
  database {
    url = ".............................."
  }

  name = "controller"
}

disable_mlock = true

kms "awskms" {
  kms_key_id = "......................"
  purpose    = "root"
}

kms "awskms" {
  kms_key_id = "........................"
  purpose    = "worker-auth"
}

listener "tcp" {
  address     = "......:9201"
  purpose     = "cluster"
  tls_disable = true
}

listener "tcp" {
  address     = "......:9200"
  purpose     = "api"
  tls_disable = true
}

I’m a bit confused here. Could you use pre-formatted text for the hcl snippets?

Didn’t even know you could do that. There you go, better?

1 Like

It seems like you’ll use AWS for your production deployment, so let’s stick to the AWS KMS worker auth for now, shall we?

It could be that the ALB you mentioned in your initial post is messing up with the KMS-based authentication of the worker. You probably need to replace it with an NLB with TLS pass-through or to make the cluster interface of the upstreams public and add them to the initial_upstreams list in the worker config.

If I put the DNS name of the ALB in the initial upstream then I get the following error:

{"id":"FO49E2otxX","source":"https://hashicorp.com/boundary/........./worker","specversion":"1.0","type":"error","data":{"error":"unable to write connection nonce: x509: certificate is valid for ***************, not lzQilF2utrYRRQJgfhG9","error_fields":{},"id":"e_kzi1VKRkWK","version":"v0.1","op":"worker.(Worker).upstreamDialerFunc"},"datacontentype":"application/cloudevents","time":"2023-04-04T09:36:57.295962831Z"}
here

In the reference architecture for AWS in the documentation shows the controller behind an ALB here.

We are using the following terraform module which sets this up for us:
Terraform Registry

ALB is only used for API access (port 9200)

How are you to handle SSL and autoscaling in AWS if you don’t use the ALB? If a worker dies how is the config kept upto date without a fixed point i.e. the ALB

You can use a load balancer for the cluster port (9201) just can’t do TLS offloading.
If you read the documentation I shared about the KMS auth, you’ll see that the TLS handshake is handled by the controller and the worker directly.