HA / Private IP's

Hi guys,

I’m trying to setup an HA deployment with Google VM’s on full private subnets.
And i’d like a client to be able to connect via an HTTPS Load Balancer.

What i did so far is :

  • setup one LB with two different backend services (the only difference with this is that the client → worker connection is via the LB)
  • install worker & controller on a single VM per zone (using three zones)
  • initialized the database, and deployed the example terraform code which creates a few objects.

What i’m experiencing so far is :

  • I can connect to the UI with the admin account
  • I cannot connect with a user account
  • The health check for the worker port is not working (http)
  • curl host:9202 returns nothing, and i dont know if that is the expected behaviour, which i believe is not.
curl  http://host01:9202
curl: (52) Empty reply from server

My configuration is the following

controller {
  name = "ctrl-host01-controller"
  description = "Boundary controller 01"
  database {
    url = "postgresql://boundary:boundary@database_ip:5432/boundary"
    max_open_connections = 5
  }
}

listener "tcp" {
  address = "10.98.0.112:9200"
  purpose = "api"
  tls_disable = true
}

listener "tcp" {
  address = "0.0.0.0"
  tls_disable = true
  purpose = "cluster"
}

worker {
  name = "work-node01"
  description = "Worker node 01"

  controllers = [
    "host01:9201",
  ]

  public_addr = "worker.domain.com:443"

  tags {
    zone = ["europe-west6-a"]
  }
}

listener "tcp" {
  purpose = "proxy"
  tls_disable = true
  address = "0.0.0.0"
}

Can you tell me if this is a doable setup ?
Service is starting fine, and shows no error.

Sep 09 07:26:59 host01 boundary[26956]: ==> Boundary server configuration:
Sep 09 07:26:59 host01 boundary[26956]:                  Transit Address: https://vault.domain.com
Sep 09 07:26:59 host01 boundary[26956]:                 Transit Key Name: boundary_worker-auth
Sep 09 07:26:59 host01 boundary[26956]:               Transit Mount Path: transit-boundary/
Sep 09 07:26:59 host01 boundary[26956]:                              Cgo: disabled
Sep 09 07:26:59 host01 boundary[26956]:   Controller Public Cluster Addr: 0.0.0.0:9201
Sep 09 07:26:59 host01 boundary[26956]:                       Listener 1: tcp (addr: "10.98.0.112:9200", cors_allowed_headers: "[]", cors_allowed_origins: "[*]", cors_enabled: "true", max_request_duration: "1m30s", purpose: "api")
Sep 09 07:26:59 host01 boundary[26956]:                       Listener 2: tcp (addr: "0.0.0.0:9201", max_request_duration: "1m30s", purpose: "cluster")
Sep 09 07:26:59 host01 boundary[26956]:                       Listener 3: tcp (addr: "0.0.0.0:9202", max_request_duration: "1m30s", purpose: "proxy")
Sep 09 07:26:59 host01 boundary[26956]:                        Log Level: debug
Sep 09 07:26:59 host01 boundary[26956]:                            Mlock: supported: true, enabled: true
Sep 09 07:26:59 host01 boundary[26956]:                          Version: Boundary v0.5.1
Sep 09 07:26:59 host01 boundary[26956]:                      Version Sha: 5f88243ddc6182db9c71ba84fd401040de4f5d41
Sep 09 07:26:59 host01 boundary[26956]:         Worker Public Proxy Addr: worker.domain.com:443
Sep 09 07:26:59 host01 boundary[26956]: ==> Boundary server started! Log data will stream in below:

Is host01 the load balancer? Does it give you anything specific about why the health-checks are failing (wrong return code, connection refused, connection timed out, etc.)?

Hello host1 is one of the hosts behind the load balancer, i actually disabled the 2 other hosts to test different configurations…

Check schema

The healthcheck for the worker side fail for the HTTP queries. If i switch the load balancer to check on TCP mode, it says the host is okay, because there’s indeed a listener for that port 9202.

The HTTP check, fails with to return code (“Empty reply from server” with a curl.

Seems i’m almost there …

$ BOUNDARY_ADDR='https://controller.domain.com' boundary connect -target-id ttcp_hnzCJIPmsv

Proxy listening information:
  Address:             127.0.0.1
  Connection Limit:    1
  Expiration:          Fri, 10 Sep 2021 20:06:51 UTC
  Port:                36905
  Protocol:            tcp
  Session ID:          s_4qF9cNB1Cf
Error dialing the worker: failed to WebSocket dial: failed to send handshake request: Get "https://worker.domain.com:443/v1/proxy": x509: certificate is valid for worker.domain.com, controller.domain.com, not **s_4qF9cNB1Cf**

:roll_eyes:

hi @nevoomatse

I think it’s because you have a TLS terminating load balancer in front of your workers, which is currently not supported as for as I know. For the controller API (9200) that’s ok, but the client-to-worker communication, Boundary uses a unique TLS stack for every session (explained here: Connections/TLS | Boundary by HashiCorp)
If you want to put a load balancer in front of the workers (9202), you need to ensure it’s a TCP load balancer. Or split the controllers/workers into different VMs and give the worker VMs a public IP address so the client can connect to the workers directly.

I’m still learning Boundary myself, so hopefully, someone from HashiCorp can confirm if what I’m writing makes sense. :slight_smile:

Hi @jsiebens , the first topics of the linked documentation explains it all …
Thank you …

I’ll try a mix, i ll put a Network LB in front of new instances only serving the worker function. And see how it goes…

What @jsiebens wrote about TLS termination is correct to the best of my knowledge. Also, there isn’t currently an HTTP health-check API endpoint for Boundary workers (see previous thread on this – as far as I know all of what was said there is still valid) – a TCP health check is the correct way to check if they’re alive.