Sessions stuck in pending state / Worker drops traffic

Tetha · February 17, 2023, 10:15am

Moin,

I’m currently trying to setup boundary as a mean to grant other employees access to databases, for example postgres, since this would be more secure for the postgres and much easier for the users.

However, I’m currently hard-stuck in the implementation on the last leg, the actual postgres connection.

I have 3 boundary controllers (version 0.12) using one of our existing postgres clusters as common storage and a load balancer to grant users access, terminate TLS and such. This is working overall, as I can access the boundary UI, configure boundary via terraform and launch boundary desktop.

In the future, I’ll need target aware workers to route users to workers in the right subnets, but currently, I just have one worker. The worker is able to contact the controllers, as I can see messages “Worker successfully authenticated” on both a controller and the worker. the worker also logs “Upstreams after first status set to: [3 internal IPs of the boundary controllers]”. And boundary-desktop creates sessions when the worker is up, and complains about “Not having available workers” when I shut the worker down, so the overall connection seems to work and exist.

Then I can launch boundary desktop, paste in my URL, authenticate via OIDC as usual. Boundary presents me with the targets I have authorized my user to see and I can click “connect”. This announces that it has opened a local port and I can try to connect with the port.

However, at this point, things become complicated:

If I connect to my local port, telnet gets stuck for a brief, but noticable moment and then closes with “Connection closed by foreign host.”. psql also errors out with “server closed the connection unexpectedly”.
The session stays in the state “pending”
Being a bit desperate, I’ve started running traffic dumps on the boundary worker
I can see that traffic from my workstation is sent to port 9201 on the worker whenever I launch my test-telnet. So, the traffic is accepted by boundary-desktop and relayed to the boundary worker.
However, there is zero traffic outbound to the postgres subnet.
I’ve tried increasing the log_level of the boundary worker to debug and trace, and also enabled as much event logging as possible. This resulted in zero additional lines being logged by the boundary worker, which is a little frustrating.

So now I’m in a state where boundary-desktop creates a session, sends traffic from my workstation to the boundary-worker and the boundary-worker apparently drops it on the floor bar any comment why.

How can I get more information here why the connections are stuck pending or why the traffic is being dropped? Or, if you need further information / logs/ … feel free to ask.

Best Regards,
Tetha

omkensey · February 17, 2023, 9:44pm

Do you get the same kind of issue if instead of Boundary Desktop, you use the CLI with boundary connect postgres?

I wouldn’t necessarily expect telnet to work at all but psql should. What’s between the Boundary worker and postgres, if anything? Is postgres set up with TLS or no?

Tetha · February 18, 2023, 10:51am

Hello,

Do you get the same kind of issue if instead of Boundary Desktop, you use the CLI with boundary connect postgres?

The behavior looks large the same.

$ boundary connect postgres -target-id=ttcp_ID  
Direct usage of BOUNDARY_TOKEN env var is deprecated; please use "-token env://<env var name>" format, e.g. "-token env://BOUNDARY_TOKEN" to specify an env var to use.
error fetching connection to send session teardown request to worker: Unable to connect to worker at <boundary-worker-url>:9202

Poking around a bit with $PATH and substituting the psql binary with something that logs it’s env and command line args upon invocation also makes it look like psql is never invoked. I can, however, still see some traffic arriving at the boundary worker via port 9202.

I wouldn’t necessarily expect telnet to work at all but psql should.

I could have been more clear here, sorry. I don’t expect telnet to work in any actual postgres capacity. I mostly expect there to be a connection to my postgres instance so I can send it some string and it logs “Unexpected TLS handshake”. At that point, I’d have a connection to my postgres and see to get psql or some graphical tool configured to use this connection.

What’s between the Boundary worker and postgres, if anything?

There isn’t anything in between. The boundary worker VM is on the same subnet as the postgres VM. Both systems use their local host firewalls, but we don’t block outbound traffic on local firewalls and the boundary worker is allowed inbound on the postgres instance. I’ve tested this by poking the pgbouncer and the postgres behind it using telnet and/or netcat from the boundary worker VM, and here I do get a connection to postgres which eventually leads to a log entry about handshake failures. So if I had psql or another postgres client on the boundary worker VM, I could connect.

Is postgres set up with TLS or no?

Yes, postgres is setup with mandatory mutual TLS. However, I do have (tested) client certificates that allow me to pass this security layer. And even though the boundary credential brokering doesn’t support issuing these certs on-demand yet, I figured that boundary just creates a TCP tunnel, and if I supply my local psql with these certificates, this should be transparent for boundary, shouldn’t it? I’d just end up with an encrypted boundary connection containing an encrypted postgres connection.

– Tetha

Topic		Replies	Views
Any way to view connected workers? Boundary	3	705	August 15, 2021
Suboptimal Boundary database load Boundary	8	326	June 5, 2023
Boundary not using Workers Boundary	3	296	March 1, 2023
Consistently "cancelling all sessions on worker" Boundary boundary	3	650	October 27, 2022
Boundary Access outside VPC Boundary	6	473	June 17, 2022

Sessions stuck in pending state / Worker drops traffic

Related topics