How do Workers Discover Controllers in Reference Architecture with Autoscaling Groups?

The reference architecture places workers and controllers in their own autoscaling groups. However, worker configuration requires populating an array of controller IPs. If the autoscaler adds new controller nodes how will existing workers (or even new workers) become aware of them? Wouldn’t that require an update to the configuration file and a restart of the boundary systemd service?

The reference architecture diagram is an example use-case. The examples for AWS don’t use auto-scaling groups to keep the code as simple as possible. However, if you’re implementing auto-scaling groups and want to discover the instances brought up by that group you can place them behind a load balancer and then declare that in the auto-scaling configuration: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_group#load_balancers

Then you can configure the workers against a well-known, pre-configured domain name to access that load balancer and the nodes behind it.

Thanks for your reply! I have a followup question. The reference architecture also recommends using a layer-7 load balancer. However, this did not work for me when I tried it. The load balancer found the hosts unhealthy over HTTPS with path /. I could access them fine individually through their EIPs, so it seems the health check was the problem.

Then I noticed that the example terraform code creates a network load balancer instead, and indeed that works. Are the custom TLS implementations the reason for these health check failures?

Thanks!

Sorry, I completely mis-read and mis-spoke in my previous response. Since workers are connected directly between the client and the target, they can’t reside behind a load balancer. They need to be directly accessible by the client. Controllers can reside behind a load balancer.

If you want to run your workers in an auto scaling group, the best work around is to assign an EIP to an ENI in advance, and then have a script execute through user data to resolve this ENI using the AWS CLI. You could also pre-seed this script on a pre-built AMI instead of using user data scripting.

I’ll update our reference architecture diagram to remove the auto-scaling group for workers, sorry about the confusion there!

You could also do a single ASG per worker, with a single LB per worker ASG. LB’s are reasonably inexpensive, but this does increase cost slightly however you’ll be able to get well-known domain names created for each worker with this setup.

@micchickenburger I’m still evaluating Boundary (successfully), but is it possible for you to use the deterministic FQDN of your controller? My controllers autoscale and the reconnect logic of the workers seem to work well.

For example my setup uses Consul for service discovery.

# worker.hcl

listener "tcp" {
  address         = "0.0.0.0:{{ env "NOMAD_PORT_proxy" }}"
  purpose         = "proxy"
  tls_disable     = true
  // All tls parameters are valid only for the api listener.
  // Cluster and proxy connections use their own ephemeral TLS stacks.
  // For more information, see the connections security concepts page.
  // https://www.boundaryproject.io/docs/concepts/security/connections-tls
}

worker {
  name        = "worker-external-{{ env "attr.unique.consul.name" }}"
  description = "Externally routed worker"
  controllers = [
    "boundary-controller-cluster.service.consul:9201"
  ]
  public_addr = "boundary-worker.acme.com:443"
}

kms "transit" {
  purpose              = "worker-auth"
  address              = "https://vault.acme.com"
  disable_renewal      = "false"

  // Key configuration
  key_name        = "boundary-worker-auth"
  mount_path      = "transit/"
  namespace       = "ns1/"
}