[boundary 0.7.5] Sessions not being cleaned

fabricef-p · April 25, 2022, 1:12pm

Boundary Version: 0.7.5
Database: Amazon RDS PostgreSQL 11.13

We’ve been experiencing some huge slowdowns of boundary with a call to the sessions API endpoints taking up to 90s
After looking at boundary database content, we did noticed in session_list view around 24k+ sessions in active state and 25k+ sessions in pending state.

Our only solution was to manually purge the sessions to return to normal performances:
DELETE FROM session WHERE create_time < NOW() - INTERVAL '1 days';

Is there a way to debug to understand why old sessions are not being cleaned? From what I see in the code, controller should try to cleanup terminated and expired sessions: boundary/repository_session.go at v0.7.5 · hashicorp/boundary · GitHub

omkensey · April 25, 2022, 6:25pm

I don’t have any debugging help to offer, but if this is easily reproducible, can you test again with 0.7.6? There have been some changes around session handling in that version. I don’t know if any of them would fix your issue, but it would be a useful data point either way.

tmessi · April 25, 2022, 6:50pm

Hi, thanks for reporting this. One thing to note, the cleanup job that you reference does not delete the sessions from the database, but rather transitions the sessions the to terminated state. It looks for:

// * sessions that have exhausted their connection limit and all their connections are closed.
// * sessions that are expired and all their connections are closed.
// * sessions that are canceling and all their connections are closed

That said I would expect sessions that are older then 8 hours to be expired, unless the targets are configured with a longer session_max_seconds. And that the expired sessions would be transitioned to the terminated state. Can you provide some additional details:

Can you confirm what the session_max_seconds is for the corresponding targets?
Are there any error logs from either the controllers or workers?
What are the session API calls that are being made? Are they from the Desktop Client or from the CLI?

fabricef-p · April 28, 2022, 9:06am

Hi. Sorry for late follow-up

@omkensey I will try to see if we can upgrade to 0.7.6. But I seems to remember that every time we upgrade, we have to ask users to generate new passwords

@tmessi

I did check directly in the database, and session_max_seconds is set to 28800
theses are the errors logs I could see on the controller

pr 19 13:56:14 ip-172-31-13-19 boundary[23135]: 2022-04-19T13:56:14.410+0700 [ERROR] encountered an error sending an error event:
2Apr 19 13:56:14 ip-172-31-13-19 boundary[23135]: error:=
3Apr 19 13:56:14 ip-172-31-13-19 boundary[23135]: | 5 errors occurred:
4Apr 19 13:56:14 ip-172-31-13-19 boundary[23135]: | * event.(Eventer).retrySend: event not written to enough sinks
5Apr 19 13:56:14 ip-172-31-13-19 boundary[23135]: | * event.(Eventer).retrySend: event not written to enough sinks
6Apr 19 13:56:14 ip-172-31-13-19 boundary[23135]: | * event.(Eventer).retrySend: event not written to enough sinks
7Apr 19 13:56:14 ip-172-31-13-19 boundary[23135]: | * event.(Eventer).retrySend: event not written to enough sinks
8Apr 19 13:56:14 ip-172-31-13-19 boundary[23135]: | * event.(Eventer).retrySend: reached max of 3: too many retries
9Apr 19 13:56:14 ip-172-31-13-19 boundary[23135]: |
10Apr 19 13:56:14 ip-172-31-13-19 boundary[23135]:
11Apr 19 13:56:14 ip-172-31-13-19 boundary[23135]: 2022-04-19T13:56:14.410+0700 [ERROR] event.WriteError: event.(Eventer).writeError: 5 errors occurred:
12Apr 19 13:56:14 ip-172-31-13-19 boundary[23135]: * event.(Eventer).retrySend: event not written to enough sinks
13Apr 19 13:56:14 ip-172-31-13-19 boundary[23135]: * event.(Eventer).retrySend: event not written to enough sinks
14Apr 19 13:56:14 ip-172-31-13-19 boundary[23135]: * event.(Eventer).retrySend: event not written to enough sinks
15Apr 19 13:56:14 ip-172-31-13-19 boundary[23135]: * event.(Eventer).retrySend: event not written to enough sinks
16Apr 19 13:56:14 ip-172-31-13-19 boundary[23135]: * event.(Eventer).retrySend: reached max of 3: too many retries
17Apr 19 13:56:14 ip-172-31-13-19 boundary[23135]: 2022-04-19T13:56:14.410+0700 [ERROR] event.WriteError: unable to write error: rpc error: code = DeadlineExceeded desc = context deadline exceeded

the API calls being made and taking a very long time were to sessions endpoints

Apr 19 15:34:32 ip-172-31-17-193 boundary[37271]: {“id”:“OwYjUWmyIa”,“source”:“https://hashicorp.com/boundary/boundary-controller-3",“specversion”:“1.0”,“type”:“observation”,“data”:{“latency-ms”:90134.103818,“request_info”:{“id”:“gtraceid_EbSc0wROlGgnnYeICpvq”,“method”:“GET”,“path”:“/v1/sessions?filter=(”%2Fitem%2Fuser_id"+%3D%3D+“u_1FGuqi0QLe”)\u0026recursive=true\u0026scope_id=o_ijIHpRpl8t",“public_id”:“at_0svGug6K3X”,“client_ip”:“xx.xxx.xxx.xxx”},“status”:500,“stop”:“2022-04-19T15:34:32.896815692+07:00”,“version”:“v0.1”},“datacontentype”:“application/cloudevents”,“time”:"2022-04-19T15:34:32.897148626+07:00”}

fabricef-p · May 19, 2022, 5:20pm

Do you have any idea what could cause the issue? Or do you think we should just try to upgrade to latest boundary version?

tallessiqueira · September 13, 2022, 11:22pm

I’ve been using version 0.8.1.1, and I have the same errors on the log, cleaning the view really helps.

I’m upgrading to latest stable version to see if it works better

Topic		Replies	Views
Session Management Questions Boundary	8	816	June 22, 2022
Sessions come back after being canceled Boundary	8	535	August 3, 2021
Suboptimal Boundary database load Boundary	8	322	June 5, 2023
Wrong time sessions on Boundary Boundary	4	380	August 6, 2021
Consistently "cancelling all sessions on worker" Boundary boundary	3	647	October 27, 2022

[boundary 0.7.5] Sessions not being cleaned

Related topics