Boundary Controller WebUI feels slow

I’ve set up a small Boundary Cluster with 2 Controllers and 2 Workers. However, I’ve noticed that when configuring Boundary through the web interface, the user interface feels slow and laggy. Sometimes it takes more than 2 seconds to load resources. For example, changing the Scope or performing other actions takes longer than I would expect.

It doesn’t seem to be an issue with the computing power, as increasing the RAM and CPU resources doesn’t make any difference and because there is basically no Workload.

I have a separate virtual machine for my Controllers, with 4 dedicated cores and 8 GB of RAM each. My PostgreSQL database is also running on a separate VM. However, the connection doesn’t appear to be the problem. In my initial test cluster, the database was on the same VM as the Controller. I also have a Nomad Server on a similar VM, and the web UI there feels perfect with no significant wait times.

Does anyone else has a similar experience or is the Problem somewhere in my Configuration/Setup ?

Is it just an issue with the user interface? Do delays exist when attempting similar actions via the CLI? Are you accessing the UI from the same machine Boundary controller is running on, or a separate machine? What is the difference between accessing the UI remotely vs. locally?

RAM and CPU are fine but other considerations would be things like disk IOPS - what throughput does the machine have?

I think debug logs might be useful, though maybe not. You may be better off looking at the actual system logs and metrics.

Also, what version of Boundary are you using? please share the config files of the controllers and workers for reference.

Is the machine you’re running the controller on shared with other applications? Are you on the same network?

Do other users have the same issue, or is it just you? If you try the exact same steps from a separate machine, does the issue reproduce?

Lots of potential causes for an application to have a slow UI, as you probably deduce from the above.

I’m currently exclusively using Boundary 0.13.1 on all machines. The CLI also has a slight delay, regardless of whether it’s accessed from a remote machine or directly from the machine the Controller is running on. Currently, the virtual machine exclusively runs the controller and no other processes. The disk IOPS rates are 24000 for read and 18000 for write operations, and for the database VM 22500 read and 15000 write. The Controller and Database are communicating over a private network, i have to mention that the Database is a simple postgresql Docker container which may be causing the bottleneck. The metrics look fine because there is very little load on the Controller.

(im sorry but Ionos doesn’t seem to feature the English language so the metrics are in German but you should still see what it is about)

The logs seem fine besides:

could this maybe be causing the lag ? As im not quite sure what the error is about

Are you able to see anything that would indicate the Postgres container is causing a bottleneck?

You’ve mentioned that you are using 0.13.1. Is this definitely the same binary across all machines? I have seen this error pop up when mixing, for example, the HCP Worker binary with the OSS binary.

The error is defined here, I believe. It may also be an issue with how you’ve configured your workers or the controllers themselves.

Might also be worth trying something like ps -eo pid,comm,lstart,etime,time,args on each controller and worker to see if the service uptime is as expected and make sure they aren’t flapping.

Other than this, some general network/systems troubleshooting - things like ethtool, dropwatch, iostat, perf, vmstat, etc. I would recommend looking not just at the controllers but also the workers.

How did you get on with this? Were you able to make any progress, and if so, could you please share what exactly you did so that others can reference this thread in future if they face similar issues? If not, please let me know the current state and I’ll see if there’s anything else I can think of.

I apologize for the delayed response. I was focused on a different topic for a while and had to set this one aside. I can confirm that I’m using the same binary on all machines, and I’ve recently upgraded to version 14.0. I haven’t had a chance to look into any network analysis yet.

Today, I conducted a test in which I applied the exact configuration used for my current controllers to my home computer. When accessing the Admin UI from other machines, the user interface felt smooth and responsive. This indicates that the configuration itself is not the issue. The problem likely does not come from boundary, i think it is probably a problem with my cloud machines.

Cheers for the update! I’m wondering what differences exist between your home computer and the target controllers that may be resulting in symptoms similar to this. It seems like you are on the right track, feel free to reach out if you’re in need of any input.

I also have set up a small Boundary Cluster with 2 Controllers and 2 Workers, and I’ve been experiencing sluggish performance in the web interface. Despite having ample computing resources, the UI feels slow and unresponsive. Tasks like changing the Scope or performing other actions take more than 2 seconds to load, which seems unusually high considering the minimal workload.

Hard to trust what compute resources encompass without further context. Disk IOPS, network ingress/egress, memory utilisation, disk space, CPU are the major things to check for.

Try figuring out what the performance differences are between the client, controller, and workers. The bottleneck likely exists at one of these points.

As always, verbose logs from each hop (including the client) feed into this. If there’s nothing between these three points, then logs from the target system might also shed some light.

The previous issue appears to have been system or configuration-related, rather than boundary. I think there are some good troubleshooting steps in the previous thread that I might have missed here.

Essentially, you want to isolate the issue and drill down from there.

Much like going to the mechanic to report a problem - rather than simply saying “my car doesn’t work”, you’d want to say “my car doesn’t work - i’ve identified that there is a rattling sound coming from THIS location, and when entering third gear, a chugging sound coming from THIS direction”. If you get what I mean. Keeping in mind that in our context the “mechanics” do not have access to your “car” - just the things you tell us about it :smile: