Boundary-worker.service not found after deploying boundary-reference-architecture to aws

Trying to run the boundary-reference-architecture deployment for aws, and I’ve been struggling for days.

  • I guess I was supposed to know how to configure my ~/.aws/credentials file, but I didn’t. I work with multiple aws instances and terraform wasn’t hitting the one I wanted. If there is documentation about getting that right, I haven’t seen it. I got that working but wasted a lot of time getting there.
  • I had a problem with line endings when I cloned the repo to my Windows 10 machine (detailed here). Again, it’s good now but took a while.
  • I was using the Windows boundary.exe file instead of the Linux binary (that same issue actually came up in the thread mentioned above, but of course I only found that thread after solving it myself).
  • When terraform apply -target module.aws completes, I get an error that the ACM Certificate is valid in the future. It also fails to create the Load Balancer, but I assume that is because of the certificate failure. Re-running terraform apply solves both of those.

So now it completes successfully, but it doesn’t look like everything worked. The boundary-controller service is up and running (although the output from the install.sh script reported an Unable to capture a lock on the database error, does that matter?). The boundary-worker service does not exist, even though I can see the output from it installing and there are no errors in that. If I try to manually run the install script (via ssh), I get an error, but it at least creates the service.

ubuntu@ip-x-x-x-x:~$ sudo systemctl status boundary-worker
Unit boundary-worker.service could not be found.
ubuntu@ip-x-x-x-x:~$ sudo ~/./install.sh worker
The system user `boundary' already exists. Exiting.
chown: cannot access '/etc/boundary-worker.hcl': No such file or directory
Created symlink /etc/systemd/system/multi-user.target.wants/boundary-worker.service → /etc/systemd/system/boundary-worker.service.
ubuntu@ip-x-x-x-x:~$ sudo systemctl status boundary-worker
● boundary-worker.service - boundary worker
     Loaded: loaded (/etc/systemd/system/boundary-worker.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Mon 2021-09-13 18:01:07 UTC; 16s ago
    Process: 2795 ExecStart=/usr/local/bin/boundary server -config /etc/boundary-worker.hcl (code=exited, status=3)
   Main PID: 2795 (code=exited, status=3)

Sep 13 18:01:07 ip-x-x-x-x systemd[1]: Started boundary worker.
Sep 13 18:01:07 ip-x-x-x-x boundary[2795]: Error parsing config file: open /etc/boundary-worker.hcl: no such file or directory
Sep 13 18:01:07 ip-x-x-x-x systemd[1]: boundary-worker.service: Main process exited, code=exited, status=3/NOTIMPLEMENTED
Sep 13 18:01:07 ip-x-x-x-x systemd[1]: boundary-worker.service: Failed with result 'exit-code'.

I’m all out of ideas here. Can anyone help me?

Couple of things; I’ll try to take these in order:

  • This repo uses the official Terraform AWS provider to provision its AWS resources; to pass it credentials, you can either put them in ~/.aws/credentials, put them in your environment variables, or pass them explicitly in the provider config (this last item is not recommended though).
  • As you noticed, some other folks have hit the issue with line endings too. This looks like it’s an issue with how git itself handles line endings between Windows and *nix when cloning; I think I found a solution using .gitattributes that I’ll be testing tonight. Just to confirm, you are git clone'ing and not downloading the files from the repo as a ZIP file?
  • I’m looking at the idea of direct-downloading the boundary binary to the hosts in question rather than uploading it from the user in a provisioner block. I still need to think about the details of the right way to do this though.
  • I’m not sure what’s up with the ACM certificate validity period – could be clock drift between your local system and AWS?

Given the errors you had to start, I think it’s likely that various processes did not complete successfully, leaving you with no working database (boundary database init will fail if the controllers already have a lock on the DB, but they won’t ever successfully start if the DB hasn’t inited yet) and no provisioned config files on your Boundary hosts.

We need to give our Windows users a little love with some fixes in the ref-arch repo I think, and I’m working on that tonight; in the meantime, provisioning a Linux VM somewhere just to run Terraform on may be a workaround you can use to get started till those fixes get merged.

Yes, I did a git clone. I confirmed that downloading the install.sh manually gets the correct line endings, so I assume downloading the full repo as a zip would also work.

I think it should. Also you will want to download and unzip the 64-bit Linux binary somewhere, and point to it with -var boundary_bin=[the path to the folder containing the Boundary Linux binary]

I think between those two and setting your TF provider environment variables you’ll be able to get up and running with a fresh install.

I got sucked onto another project so this went to the back burner for a while, but I’m finally back to it. I set up a Linux VM as suggested to run Terraform. Most of it works, but either it still doesn’t get everything or I don’t quite understand the directions.

  • terraform apply -target module.aws looks like it runs cleanly, although controller setup still reports Unable to capture a lock on the database.
  • I can ssh to aws and see the controller service is up and running but Unit boundary-worker.service could not be found. There is no /etc/boundary-worker.hcl file present.

I can then run terraform apply to configure boundary and it seems to work. The admin console link only works via http, not https, but I found this issue that explains it so that’s not a big deal.

Should I be concerned about those errors from the controller & worker? It looks like boundary is working for me, so I’m not sure what might be missing.

num_workers defaults to 1 and num_controllers to 2, so you should get a single worker and two controllers all on different instances, with the controller instances fronted by an AWS LB. The error message about the lock on the database is normal if the controller tries to come up before the database init is finished – while DB init is running controllers will not get the database lock they need to operate, but if you get the admin console UI it means they are coming up eventually after the DB init was successful.

What does terraform state list in the main terraform directory tell you about module.aws.aws_instance.worker?

I am no longer employed in the position where I was using this, so I do not have access to the system to check that for you. Thanks for your help!