thanks for the reply @jeff
I upgraded it to 0.14.3, but still i am facing issue where my workers just stop working. for a workaround i am just restarting workers every hour.
Any help would be appreciated. thanks
Do you have any firewalls that might be dropping persistent connections? I unfortunately don’t have a lot of advice to give if you’re not seeing any errors on either the controller or worker logs.
I dont have firewalls on my servers. Can you suggest any steps which i can do when it hangs, which might give you some clue about the error?
Also just an fyi i have boundary on my k8s setup with 3 workers and 3 controllers.
I’m not sure if Kube networking might have some part here, but if the worker totally wedges, you can try sending a SIGQUIT (on the console you can do this with Ctrl-\) and sending a link to the output and we can see if there seems to be a deadlock somewhere.
It appears that there is at least one connection that is in the process of being proxied; nothing seems to have interrupted it
At the same time, the worker is trying to drain connections, which is what would happen if the worker was in the process of being shut down, but is stalled there
There is another stall - a connection being made upstream that is stuck in the TLS handshaking process
I hate to ask you to upgrade again but we just saw someone else having a few behaviors that are extremely similar and it was due to some undocumented gRPC behavior that we have worked around in Fix issue with workers connecting in high latency conditions by jefferai · Pull Request #4535 · hashicorp/boundary · GitHub (despite the title, very high latencies are not needed to see this, 25ms or so is enough). I’m not 100% convinced it will help you but it very well may - if you can upgrade to 0.15.3 and see if that helps it would be great.