Im trying to setup a simple 2 node setup inside a Vagrantbox on Ubuntu. Below are the worker and controller configs respectively. Setup checks out fine, except that behaviour of the worker seems little odd. Worker systemd services starts and authenticates to controller successfully but ends up looking for a local listener on default controller port (0.0.0.0:9201) forever, since I’m using a dedicated node for worker.
Details below:
Worker:
disable_mlock = true
# listener denoting this is a worker proxy
listener "tcp" {
address = "172.16.1.112:9202"
tls_disable = "true"
purpose = "proxy"
}
# worker block for configuring the specifics of the
# worker service
worker {
# name= "worker-1"
public_addr = "172.16.1.112"
initial_upstreams = [ "controller-1:9201" ]
address = "172.16.1.112"
auth_storage_path = "/etc/boundary.d/test"
# tags {
# type = ["worker-1"]
# }
}
Controller:
# disable memory from being swapped to disk
disable_mlock = true
# API listener configuration block
listener "tcp" {
# Should be the address of the NIC that the controller server will be reached on
# Use 0.0.0.0 to listen on all interfaces
address = "0.0.0.0:9200"
# The purpose of this listener block
purpose = "api"
# TLS Configuration
tls_disable = false
tls_cert_file = "/etc/boundary.d/tls/boundary.crt"
tls_key_file = "/etc/boundary.d/tls/boundary.key"
# Uncomment to enable CORS for the Admin UI. Be sure to set the allowed origin(s)
# to appropriate values.
#cors_enabled = true
#cors_allowed_origins = ["https://yourcorp.yourdomain.com", "serve://boundary"]
}
# Data-plane listener configuration block (used for worker coordination)
listener "tcp" {
# Should be the IP of the NIC that the worker will connect on
address = "0.0.0.0:9201"
# The purpose of this listener
purpose = "cluster"
tls_disable = "false"
tls_cert_file = "/etc/boundary.d/tls/boundary.crt"
tls_key_file = "/etc/boundary.d/tls/boundary.key"
}
# Ops listener for operations like health checks for load balancers
listener "tcp" {
# Should be the address of the interface where your external systems'
# (eg: Load-Balancer and metrics collectors) will connect on.
address = "0.0.0.0:9203"
# The purpose of this listener block
purpose = "ops"
tls_disable = false
tls_cert_file = "/etc/boundary.d/tls/boundary.crt"
tls_key_file = "/etc/boundary.d/tls/boundary.key"
}
# Controller configuration block
controller {
# This name attr must be unique across all controller instances if running in HA mode
name = "controller-1"
controller {
# This name attr must be unique across all controller instances if running in HA mode
name = "controller-1"
description = "Boundary controller number one"
#tls_disable = "true"
#tls_cert_file = "/etc/boundary.d/tls/boundary.crt"
#tls_key_file = "/etc/boundary.d/tls/boundary.key"
# This is the public hostname or IP where the workers can reach the
# controller. This should typically be a load balancer address
public_cluster_address = "controller-1"
# Enterprise license file, can also be the raw value or env:// value
# license = "file:///path/to/license/file.hclic"
# After receiving a shutdown signal, Boundary will wait 10s before initiating the shutdown process.
graceful_shutdown_wait_duration = "10s"
# Database URL for postgres. This is set in boundary.env and
#consumed via the “env://” notation.
database {
url = "postgresql://postgres:postgres@127.0.0.1:5432/postgres?sslmode=disable"
}
}
Using worker-led authentication is being used.
Issue - Despite configuring the controller IP/name, worker continuously polls for local listener 0.0.0.0:9201 as controller for some weird reason. And automatically updates the upstream to 0.0.0.0
First line confirms worker authentication to controller is successful.
ul 26 16:06:20 boundary-2 boundary[12248]: {"id":"J03YG5y5C2","source":"https://hashicorp.com/boundary/boundary-2/worker","specversion":"1.0","type":"system","data":{"version":"v0.1","op":"worker.(Worker).upstreamDialerFunc","data":{"msg":"worker has successfully authenticated"}},"datacontentype":"application/cloudevents","time":"2023-07-26T16:06:20.727256595Z"}
Jul 26 16:06:20 boundary-2 boundary[12248]: {"id":"Yx79mQz1C9","source":"https://hashicorp.com/boundary/boundary-2/worker","specversion":"1.0","type":"system","data":{"version":"v0.1","op":"worker.(Worker).updateAddrs","data":{"msg":"Upstreams after first status set to: [0.0.0.0:9201]"}},"datacontentype":"application/cloudevents","time":"2023-07-26T16:06:20.778388436Z"}
Jul 26 16:06:20 boundary-2 boundary[12248]: {"id":"IJ3tOFcXIq","source":"https://hashicorp.com/boundary/boundary-2/worker","specversion":"1.0","type":"error","data":{"error":"(nodeenrollment.protocol.Dial) unable to dial to server: (nodeenrollment.protocol.Dial) unable to dial to server: dial tcp 0.0.0.0:9201: connect: connection refused","error_fields":{},"id":"e_ju8KOvVuJx","version":"v0.1","op":"worker.(Worker).upstreamDialerFunc"},"datacontentype":"application/cloudevents","time":"2023-07-26T16:06:20.779632794Z"}
Jul 26 16:06:20 boundary-2 boundary[12248]: {"id":"KUWh4ctx2A","source":"https://hashicorp.com/boundary/boundary-2/worker","specversion":"1.0","type":"error","data":{"error":"worker.(Worker).upstreamDialerFunc: unknown, unknown: error #0: (nodeenrollment.protocol.Dial) unable to dial to server: (nodeenrollment.protocol.Dial) unable to dial to server: dial tcp 0.0.0.0:9201: connect: connection refused","error_fields":{"Code":0,"Msg":"","Op":"worker.(Worker).upstreamDialerFunc","Wrapped":{}},"id":"e_O7cDecx1Sy","version":"v0.1","op":"worker.(Worker).upstreamDialerFunc"},"datacontentype":"application/cloudevents","time":"2023-07-26T16:06:20.780113562Z"}
Some more logs, denoting removal old valid upstreams and amending new upstream of local listener 0.0.0.0:9201
Jul 26 16:20:06 boundary-2 boundary[12248]: {"id":"udbvJrPV64","source":"https://hashicorp.com/boundary/boundary-2/worker","specversion":"1.0","type":"system","data":{"version":"v0.1","op":"worker.(Worker).updateAddrs","data":{"msg":"Upstreams has changed; old upstreams were: [0.0.0.0:9201], new upstreams are: [0.0.0.0:9201 controller-1:9201]"}},"datacontentype":"application/cloudevents","time":"2023-07-26T16:20:06.643339422Z"}
Jul 26 16:20:06 boundary-2 boundary[12248]: {"id":"G05FlTrvnH","source":"https://hashicorp.com/boundary/boundary-2/worker","specversion":"1.0","type":"system","data":{"version":"v0.1","op":"worker.(Worker).upstreamDialerFunc","data":{"msg":"worker has successfully authenticated"}},"datacontentype":"application/cloudevents","time":"2023-07-26T16:20:06.671962992Z"}
Jul 26 16:20:08 boundary-2 boundary[12248]: {"id":"m47sSaaJmu","source":"https://hashicorp.com/boundary/boundary-2/worker","specversion":"1.0","type":"error","data":{"error":"(nodeenrollment.protocol.Dial) unable to dial to server: (nodeenrollment.protocol.Dial) unable to dial to server: dial tcp 0.0.0.0:9201: connect: connection refused","error_fields":{},"id":"e_bdtJclHeam","version":"v0.1","op":"worker.(Worker).upstreamDialerFunc"},"datacontentype":"application/cloudevents","time":"2023-07-26T16:20:08.198884097Z"}
Jul 26 16:20:08 boundary-2 boundary[12248]: {"id":"0BSEgjgRTT","source":"https://hashicorp.com/boundary/boundary-2/worker","specversion":"1.0","type":"error","data":{"error":"worker.(Worker).upstreamDialerFunc: unknown, unknown: error #0: (nodeenrollment.protocol.Dial) unable to dial to server: (nodeenrollment.protocol.Dial) unable to dial to server: dial tcp 0.0.0.0:9201: connect: connection refused","error_fields":{"Code":0,"Msg":"","Op":"worker.(Worker).upstreamDialerFunc","Wrapped":{}},"id":"e_IYq1iGSYme","version":"v0.1","op":"worker.(Worker).upstreamDialerFunc"},"datacontentype":"application/cloudevents","time":"2023-07-26T16:20:08.199088417Z"}
Jul 26 16:20:09 boundary-2 boundary[12248]: {"id":"Y9e2t4yKhr","source":"https://hashicorp.com/boundary/boundary-2/worker","specversion":"1.0","type":"system","data":{"version":"v0.1","op":"worker.(Worker).updateAddrs","data":{"msg":"Upstreams has changed; old upstreams were: [0.0.0.0:9201 controller-1:9201], new upstreams are: [0.0.0.0:9201]"}},"datacontentype":"application/cloudevents","time":"2023-07-26T16:20:09.069749314Z"}
Worker shows up fine from the command
vagrant@boundary-1:~$ boundary workers list -tls-insecure
Direct usage of BOUNDARY_TOKEN env var is deprecated; please use "-token env://<env var name>" format, e.g. "-token env://BOUNDARY_TOKEN" to specify an env var to use.
Worker information:
ID: w_BHwoctASdf
Type: pki
Version: 1
Authorized Actions:
no-op
read
update
delete
add-worker-tags
set-worker-tags
remove-worker-tags
ID: w_VI2cwvmup9
Type: pki
Version: 1
Address: 172.16.1.112:9202
ReleaseVersion: Boundary v0.13.0
Last Status Time: Wed, 26 Jul 2023 16:15:18 UTC
Authorized Actions:
no-op
read
update
delete
add-worker-tags
set-worker-tags
remove-worker-tags
Any pointers towards solving this will be appreciated. Controller and Worker hostnames resolve locally. Hence have used hostnames directly wherever applicable.
Basic sanity checks done:
9200, 9201, 9203 - running on controller, reachable via worker
9202 - running on worker, reachable via controller
Firewalls disabled.