Boundary UI Error 500 with larger host sets (CLI works fine) – v0.19.3 – RKE2/Ubuntu 24.04

Hi all,

I’m running Boundary v0.19.3 on RKE2 (Kubernetes) / Ubuntu 24.04 LTS.

  • The Boundary UI works fine when the host set is empty or contains just a few hosts.

  • Once the host set grows (typically around 40–50 hosts, definitely before reaching 100), the UI starts returning an error 500 when I try to add more hosts / show host in host-sets:

    ERROR 500
    Server error
    We ran into a problem and could not continue. You can ask your administrator or try again later.
    
    
  • Important: CLI (boundary.exe host-sets set-hosts etc.) continues to work perfectly; no errors there.

Environment:

  • Boundary version: 0.19.3

  • Running on: RKE2 Kubernetes, Ubuntu 24.04 LTS (controller and database)

  • CLI client: Windows

  • Number of hosts where the issue appears: typically at 40–50, always below 100

  • No issues with the CLI; only the UI fails.

Questions:

  • Is this a known limitation or bug with the Boundary UI?

  • Are there any server tweaks or configuration settings that could help?

  • Has this issue been resolved in newer Boundary versions or does anyone know a workaround?

Any advice, experience, or suggestions would be greatly appreciated

Thanks in advance,

Hello,

Thanks for reaching out. From what I can tell this isn’t something we’ve seen before. It’s interesting that it happens at around the 40-50 hosts threshold from the UI, and that the CLI had no issues. I tried a quick test with 60+ hosts added to a host set in various combinations of scenarios and wasn’t able to hit any issues with the UI.

I have a few follow up questions that might help troubleshoot this further:

  • Does this issue only impact an individual host set when it hits that 40-50 host threshold, other host sets below the threshold are unaffected?
  • From Boundary Admin, there’s the “create and add” or the “add existing” option. Are both workflows failing when the endpoint fails with a 500?
  • Can you confirm that you’re seeing the 500 returned only on the POST /v1/host-sets/hsst_1234567890:add-hosts (hsst_1234567890 replaced with your host set id) in the network tab of the browser developer tools?
  • You also mentioned that you receive a 500 when it shows hosts in a host set, is this on the Host Sets page on the Hosts tab?
  • After it gets to this point where the api returns a 500 is it consistently failing on every request afterwards, or are some succeeding?
  • If you add a host via the CLI does that change anything going back and trying to add a host from the UI?

The next thing that might be worth checking is the server logs to see if that provides any insights around the time of the failed requests. Let us know if there’s anything else you think is noteworthy around this particular scenario. Thanks!

Hello,

I am providing a comprehensive overview of the 500-error issue we encounter in the Boundary UI when working with large host sets. This includes a detailed testing history, observed patterns, and answers to all previously asked clarifying questions. At the end, I would appreciate your advice on the best commands/methods for extracting and sharing relevant log data for further debugging.


1. Environments and Scope

  • Only 2 host sets have ever had more than 40 hosts; all the rest remain below this threshold.

  • No UI errors have ever occurred on any host set with fewer than approximately 40 hosts, irrespective of their update/browsing frequency.


2. Detailed Testing History & Error Pattern

Over several test rounds, I have systematically created new host sets and increased their size in order to find a consistent reproduction pattern. Below are the most relevant sequences and observations:

Test Case 1: New Host Set A

  • Created new host set.

  • Added 50 servers: No issues. ( “Add Existing Host”)

  • Added 5 more (total 55): No issues. ( “Add Existing Host”)

  • Added 5 more (total 60): No issues. ( “Add Existing Host”)

  • Added 10 more (total 70) ( “Add Existing Host”) : UI immediately reported “Added successfully” AND, at the same time, “ERROR 500: Server error. We ran into a problem and could not continue.”

  • When reloading or revisiting the host set page, the 500 error would consistently appear. The list could not be viewed in the UI, despite the correct number of hosts being present.

Test Case 2: New Host Set B

  • Deleted previous (defective) host set and recreated a new one.

  • Added 60 servers at once: No error. ( “Add Existing Host”)

  • Added 1 more (total 61): No error. ( “Add Existing Host”)

  • Added another 9 to reach 70 total ( “Add Existing Host”): Success message appeared, but upon revisiting the host set or viewing the list, the UI immediately showed a 500 internal server error.

  • Removing all hosts from the defective set (using the UI) did not resolve the UI error. The host set remained inaccessible from the UI until it was deleted and recreated.

Test Case 3: Additional Host Sets

  • Created several other test host sets and added hosts incrementally in groups of 5–10.

  • In all cases, the UI only began returning consistent 500 errors once the host set exceeded approximately 60–70 hosts.

  • All other host sets remaining under 40 hosts had zero issues, even with repeated add/remove/browse operations.

Observed Unique Error (Duplicate/Add Existing)

Occasionally, when using “Add Existing Host” in the UI on large host sets, I have also received this error instead of or alongside the 500 error:

Error
Invalid request. Request attempted to make second resource with the same field value that must be unique.


3. UI Workflows

  • Tested: So far, I have only tested the “Add Existing Host” workflow in the UI. I have not tested the “Create and Add” workflow yet. Please let me know if you wish for me to try that scenario.

  • CLI: I have not experienced any issues adding or removing large groups of hosts via the CLI. Even when the UI fails (i.e., reports a 500 error or becomes unusable for a certain host set), the CLI can always be used to read/modify the host set without problem.


4. API Endpoint and Network Details

  • The UI 500 error is observed in two main scenarios:

    1. Immediately after clicking “Add Existing Host” to a large host set (typically a 500 response from a POST request).

    2. When browsing or reloading the host set view in the UI after growing the set to a large size (500 response)

  • Occasionally, both a “Success: Added successfully” and an “ERROR 500” message are shown together in the UI.


5. Reproducibility and Consistency

  • Once a host set triggers a UI 500 error, it will consistently fail on all subsequent UI access and is not “recovered” simply by removing hosts via CLI.

  • The rest of the application behaves normally; only the specific large host set is affected.

  • All smaller host sets (below ~40 hosts) remain fully accessible and error-free.


6. Server/Controller Logs

  • I have not yet identified any obvious error entries or stack traces in the Boundary server, controller, or worker logs at the moment of the 500 error.

  • Could you please advise on:

    • Which components and log paths are most relevant for UI 500 errors of this kind?

    • Which keywords, severity levels, or patterns should I search for?

    • Are there specific CLI or kubectl commands you recommend for collecting log output for review (for both Docker and Kubernetes deployments if possible)?


7. Summary and Next Steps

Summary:

  • The issue is exclusive to the UI when working with large host sets (usually >60 hosts; never with <40).

  • No errors of this kind have been observed using the CLI under any circumstances..

  • Occasional duplicate/uniqueness errors have been seen (“Invalid request. Request attempted to make second resource with the same field value that must be unique.”) when repeatedly adding hosts via “Add Existing Host”.

Next Steps:

  • Please let me know if you would like me to:

    • Test the “Create and Add” workflow in detail

    • Supply specific network trace data, endpoint response bodies, or session logs

    • Try any additional UI, CLI, or API workflows

  • Please provide guidance on recommended log extraction and what to look for in server logs for boundary-related UI 500 errors.

Thank you very much for your support and collaboration.
If there’s anything else you’d like me to test, provide, or clarify, please let me know.

Further clarification:

It is important to make it clear that as long as I am on the “Details” tab, I am able to add as many servers as I want—250 or more—without any problem. The Error 500 only appears after I have clicked “Add Host” under the “Add Existing Host” function. Even then, the servers are added to the host set as expected. The error 500 occurs immediately afterward, and specifically when the UI tries to display the host set under the “Hosts” tab. In other words, the addition of hosts works, but the subsequent display of a large host set fails with a 500 error on the “Hosts” tab.

Additional information based on my findings:

  • The CLI can handle large host sets without any issues.

  • The UI fails only when displaying the “Hosts” tab for large sets.

  • I have tested with up to 250 hosts in a single host set.

  • I am using the standard Boundary UI (My Browser is MS Egde / FF)

The issue occurs specifically when displaying a host set with more than approximately 60–70 servers in the UI. Adding servers works as expected, but once the host set exceeds this threshold, attempting to view it in the “Hosts” tab consistently results in a 500 error. This does not affect host sets with fewer than about 60 servers.

New Findings and Additional Testing

To ensure a completely clean test environment, I have now created an entirely new organization. In this new setup, I only went as far as creating a new host set and then used the CLI to register 250 random test servers with commands like:

Copy sh

boundary hosts create static -host-catalog-id hcst_ohIEspNaFp -name "Server002" -address 10.10.10.2 -addr https://boundary.xxxxx.xxx
boundary hosts create static -host-catalog-id hcst_ohIEspNaFp -name "Server003" -address 10.10.10.3 -addr https://boundary.xxxxx.xxx
boundary hosts create static -host-catalog-id hcst_ohIEspNaFp -name "Server004" -address 10.10.10.4 -addr https://boundary.xxxxx.xxx
boundary hosts create static -host-catalog-id hcst_ohIEspNaFp -name "Server005" -address 10.10.10.5 -addr https://boundary.xxxxx.xxx
boundary hosts create static -host-catalog-id hcst_ohIEspNaFp -name "Server006" -address 10.10.10.6 -addr https://boundary.xxxxx.xxx
boundary hosts create static -host-catalog-id hcst_ohIEspNaFp -name "Server007" -address 10.10.10.7 -addr https://boundary.xxxxx.xxx
# ... etc. up to Server250

After this, I proceeded to add 200 out of the 250 available test servers to the new host set. I am seeing exactly the same pattern as described in my earlier posts:
Once the host set exceeds approximately 60–80 servers, I immediately encounter a 500 error when either adding more servers with the “Add Hosts” function, or when viewing the host set under the “Hosts” tab in the UI.

Further clarification:
It is important to highlight that as long as I am on the “Details” tab, I am able to add as many servers as I want—200+, in this case—without any problem. The Error 500 only appears right after clicking “Add Host” under the “Add Existing Host” function, and also when I switch to the “Hosts” tab for the host set. All selected servers are still added as expected, but the subsequent display of a large host set fails with a 500 error in the UI.

To rule out any additional factors, I have not created any credentials or configured anything else in this test organization—only the host set and test servers were created.

For reference:

  • The CLI can handle large host sets with no issues at all.

  • The problem is reproducible in a completely clean setup.

  • I have tested with up to 250 hosts in a single host set.

  • I am using the standard Boundary UI.

Additionally, I can add that I am using the latest bitnamisecure/postgresql-ha image (tag: latest) for PostgreSQL HA, running with 3 pods. There are no errors or issues to be seen from the database side, either in this environment or in other environments using the same PostgreSQL cluster.

This should help rule out environmental or database-cluster-specific causes.(I hope)
Please let me know if further details or logs could be helpful.

postgres=# SELECT version();
version

PostgreSQL 17.6 on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit
(1 row)

pgpool --version pgpool-II version 4.6.3 (chirikoboshi)

I have same problem with v0.20.0

Update — additional log observations comparing organizations with large and small host sets

I’m continuing to investigate the persistent “500 Internal Server Error” issue in the Boundary UI when working with large static host sets.
Below are my latest observations, including relevant controller logs, for both large and small host sets across different organizations.

How I collect logs:

I’m monitoring the boundary-controller logs using the following command:

kubectl logs -n boundary -l app.kubernetes.io/name=boundary-controller -f

Please note that there may be some slight timing mismatch between UI clicks and the exact log output, but the log excerpts below reflect what I consistently see when reproducing the problem.


Case 1: UI error with large host sets (approx. 70+ hosts)

  • When I open the “Hosts” tab for a large host set in the UI, I observe logs such as:
op: static.(Repository).LookupHost
error: db.LookupById: ... tls error: server refused TLS connection

op: static.(Repository).LookupHost
error: server error: ERROR: unable to read message kind (SQLSTATE XX000)

  • The UI instantly returns a 500 Internal Server Error.

Case 2: No errors with small host sets (e.g. 10 hosts) in another organization

  • When switching to an organization/host set with far fewer hosts (e.g., 10 hosts):

    • The “Hosts” tab loads without issue.

    • Controller logs show only standard APIRequest/Audit events, for example:

op: static.(Repository).LookupHost
status: 200

  • No errors occur in the UI or controller logs.

Summary:
The 500 UI error and the controller log errors only occur on large host sets. Smaller sets function perfectly, both in the logs and UI.


Note on pgpool-II:
I’m also considering whether pgpool-II could be involved in any way. However, I don’t see why it would cause issues only when displaying large host sets, especially since everything (including TLS) works flawlessly for small host sets through the same pgpool-II instance.
If this detail is relevant, I wanted to mention it for completeness, even though there’s no clear indication that pgpool-II is to blame based on my observations so far.


If further log details or more specific tests would be helpful, please let me know.