Worker works unreliably

My setup consists of an OSS Boundary controller and an OSS Boundary worker as two docker containers on the same host with a vault cluster as a secret store.

Furthermore, I configured the controller using terraform for a minimal setup to test everything together.

Now, when connecting to a target with boundary cli or boundary Desktop it does work, but only sometimes. Most of the time I get errors like:

kex_exchange_identification: Connection closed by remote host
Connection closed by 127.0.0.1 port 43785

after a loooong timeout

But if I repeat this often enough it then magically works again but only for a couple of times.

What is the reason for that?

Feels like a pretty harsh letdown after investing so much time into getting everything working, connected and setup.

Impossible to say with the info provided at present, so let’s try to narrow it down a bit.

This error is technically from OpenSSH, not Boundary. So to troubleshoot this, try to remove Boundary from the situation and simply try to SSH to the target using the parameters you’ve provided Boundary. What’s the result?

Next (I’m assuming this is an attempt at SSH with boundary connect via the CLI?), try to add the -vvv flag to the end of it. I don’t have immediate access to Boundary but it should be something like boundary connect <parameters> -- -vvv, where the -vvv invokes the ssh command verbose flag. You might need to play around with placement here a tiny bit if that doesn’t work.

The fact that it’s intermittent is strange. Are you using a proxy, firewall, or load balancer? Maybe you’ve got round robin set somewhere and it’s not fully updated with the required certs at all potential hops.

Also, if you check your target host’s SSH daemon log, how does it report these failed connection attempts? That’s /var/log/secure on RHEL-based distros or /var/log/auth.log on Debian-based distros.

If the above doesn’t point you in the right direction, the aforementioned SSH daemon log entries would be useful. It’d also be useful to see things like how you’ve got it configured, what the Terraform code looks like, etc.

You might also want to enable logging for both Boundary’s worker and controller. If the issue does end up being due to Boundary failing at some stage it’ll likely be something trivial like the worker service flapping due to a misconfiguration or error somewhere. If this is the case, their logs will help paint a better picture of what’s going on. That said, the fact that you’re receiving a connection closed by remote host indicates that Boundary is successfully sending the request, but your remote host is closing it. That’d be a typical response if you’re passing the wrong credentials to the remote host.

If anything above doesn’t work please share the full output.

1 Like

Yeah, you are right just had to restart WireGuard. Now it works as expected

1 Like

Oh nice. Glad you got it sorted.