Session getting disconnected again and again

swapnil-raj · December 6, 2023, 6:56am

My connections to boundary are getting dropped again and again and i am getting this error in controller logs. Any ideas?

{“id”:“uxpVjCWeGo”,“source”:“https://hashicorp.com/boundary/boundary-controller-w7pwh/controller",“specversion”:“1.0”,“type”:“error”,“data”:{“error”:"server.(WorkerAuthRepositoryStorage).storeNodeInformation: db.Create: duplicate key value violates unique constraint "worker_auth_authorized_pkey": unique constraint violation: integrity violation: error #1002”,“error_fields”:{“Code”:1002,“Msg”:“”,“Op”:“server.(WorkerAuthRepositoryStorage).storeNodeInformation”,“Wrapped”:{“Code”:1002,“Msg”:“”,“Op”:“db.Create”,“Wrapped”:{“Code”:1002,“Msg”:“duplicate key value violates unique constraint "worker_auth_authorized_pkey"”,“Op”:“”,“Wrapped”:{“Code”:1002,“Msg”:“unique constraint violation”,“Op”:“”,“Wrapped”:null}}}},“id”:“e_JuAjlwWFY8”,“version”:“v0.1”,“op”:“server.(WorkerAuthRepositoryStorage).storeNodeInformation”},“datacontentype”:“application/cloudevents”,“time”:“2023-12-06T06:47:22.028584534Z”}

jeff · December 12, 2023, 5:05pm

We’ve seen this before although not in recent version of Boundary – did you upgrade?

Can you try deleting the worker from the controller and re-authorizing it and see if that helps?

swapnil-raj · December 13, 2023, 5:52am

That helps, but again after sometime i am getting same issue. Just an fyi, i am using 0.13.0.

jeff · December 14, 2023, 2:55pm

I believe you are actually hitting bug(workerAuth): allow duplicate workerAuth inserts if records match by irenarindos · Pull Request #3389 · hashicorp/boundary · GitHub which was fixed in 0.13.1 (you may have to remove/re-add the worker after upgrade).

swapnil-raj · March 21, 2024, 11:32am

thanks for the reply @jeff
I upgraded it to 0.14.3, but still i am facing issue where my workers just stop working. for a workaround i am just restarting workers every hour.
Any help would be appreciated. thanks

jeff · March 21, 2024, 1:15pm

Is it the same error message?

The one you specified in the original post would only be an issue at registration or credential rotation time.

What do the logs from your workers look like?

swapnil-raj · March 22, 2024, 7:54am

Yup i don’t see that error right now. I don’t actually see any error logs right now but for some reason it just stops working.

jeff · March 27, 2024, 12:18am

Do you have any firewalls that might be dropping persistent connections? I unfortunately don’t have a lot of advice to give if you’re not seeing any errors on either the controller or worker logs.

swapnil-raj · March 27, 2024, 5:23am

I dont have firewalls on my servers. Can you suggest any steps which i can do when it hangs, which might give you some clue about the error?
Also just an fyi i have boundary on my k8s setup with 3 workers and 3 controllers.

jeff · March 28, 2024, 1:27pm

I’m not sure if Kube networking might have some part here, but if the worker totally wedges, you can try sending a SIGQUIT (on the console you can do this with Ctrl-\) and sending a link to the output and we can see if there seems to be a deadlock somewhere.

swapnil-raj · April 2, 2024, 9:26am

this is the output in logs that i got

jeff · April 17, 2024, 1:29pm

That looks like it’s partial, was that the full output? Did your terminal not have enough scrollback configured to hold the whole output perhaps?

swapnil-raj · April 18, 2024, 4:54am

I think this is the full output, I will check once more.

swapnil-raj · April 18, 2024, 5:45am

@jeff I think this will have full data

Thanks for your help

jeff · April 26, 2024, 7:23pm

Sorry for the delay, I’ve been on PTO.

In the log I see a few things:

It appears that there is at least one connection that is in the process of being proxied; nothing seems to have interrupted it
At the same time, the worker is trying to drain connections, which is what would happen if the worker was in the process of being shut down, but is stalled there
There is another stall - a connection being made upstream that is stuck in the TLS handshaking process

I hate to ask you to upgrade again but we just saw someone else having a few behaviors that are extremely similar and it was due to some undocumented gRPC behavior that we have worked around in Fix issue with workers connecting in high latency conditions by jefferai · Pull Request #4535 · hashicorp/boundary · GitHub (despite the title, very high latencies are not needed to see this, 25ms or so is enough). I’m not 100% convinced it will help you but it very well may - if you can upgrade to 0.15.3 and see if that helps it would be great.

jeff · May 2, 2024, 7:02pm

One other thing, we’ve identified an issue that could prevent the listener from being closed properly and are working on fixing it…another thing that may be related to this based on my analysis. Once 0.16.1 is out it may be relevant to your issue.

Topic		Replies	Views
Database constraint errors when starting 0.7.4 controller Boundary boundary	2	606	February 12, 2022
Error: invalid request. request attempted to make second resource with the same field valud must be unique Boundary	6	516	November 17, 2021
[Boundary] [Session] Invalid request. Request attempted to make second resource with the same field value that must be unique Boundary	1	876	May 14, 2021
Boundary connection to MySQL RDS doesn't work everytime Boundary	12	1619	April 26, 2023
Boundary client is closing connection intermittently for few users and db connection is terminated with "timeout expired" Boundary	1	270	September 6, 2023

Session getting disconnected again and again

Related topics