I apologize for the length of this; I’m not entirely sure what’s relevant and what’s not, so I wanted to provide as much context as possible.
I’m working on a Go module that wraps a database connection with ephemeral database roles generated by Vault. It presents as a database/sql/driver.Connector
, acquiring new credentials from Vault on each Connect()
call if they’re needed (on first connect, or after lease expiration), or using the existing ones if not. It uses a LifetimeWatcher
to keep track of when the credentials are expiring. I had to add another channel to trigger getting new credentials when the token the app is using expires and is replaced, which was a bit surprising to me, but it seems to work.
I’m now adding support for Postgres’ LISTEN
command, via the Listener
type in github.com/lib/pq
. In the course of this, I discovered that existing database connections were not severed when their credentials were revoked, so I set the revocation_statements
to the following:
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE usename = '{{name}}';
DROP ROLE IF EXISTS "{{name}}";
so that the connections would be forcibly severed. For “normal” (non-LISTEN
) connections, this doesn’t have much effect, since the connections don’t typically stay open for long anyway (pretty low volume of calls).
For Listener connections, though, I’m getting some odd behavior. If I just grab credentials from Vault on the first connection and then every time the Listener tries to reconnect and fails (the pq
code sends a notification when that happens, giving me the opportunity to get new creds before reconnecting), then that’s exactly what happens when the credentials’ TTL is reached: the DB backend is killed, the listener is disconnected, attempts reconnection, fails due to bad credentials, I get new ones, reconnect, and everything is good. The downside of this is that this takes a little bit of time, so there’s a window where I can miss a notification.
Instead, I’d like to set up a new listener prior to expiration, so that I can cover that window. So I changed my listener code to use the same LifetimeWatcher
code that I’m using for the Connector
to tell me when that’s going to happen. Everything just worked, until I looked under the covers. I get the first renewal notification (on watcher.RenewCh()
) immediately, as always (odd behavior, but it hasn’t been a problem), and then again three-quarters of the way through each lease, until the max TTL is reached, all as expected.
But when I get a notification on watcher.DoneCh()
, and I set my Connector
up to fetch new credentials the next time through, my listener just keeps on listening, and its connection is never terminated. The Vault trace logs don’t show the revocation, and the Postgres logs don’t show the revocation SQL statements coming through, either. I put some debugging statements into handleRevokeRenew()
in sdk/framework/backend.go
to see what’s going on, and I never see the revocation operation; it’s only ever renew. (And even then the renews only happen up to the point where the revocation should happen.)
I don’t really understand why having the watcher running would make a difference here. I expect there’s a bug in my code somewhere, or at least a misunderstanding of how to use the Vault Go API interfaces properly, but I’m not sure what to look for. I also have not yet confirmed that the revocations were happening in the non-listener context, but the testing I did do suggested that they were.
The code is not mine to share, but I will ask, if that’ll be useful. I’d like to open-source this, anyway. But I suspect that this problem will either sound familiar to someone and they can explain what I must be doing wrong, or I’ll just have to find it myself.
This is all with Vault 1.3.2. Also, all running in a test suite, using a setup similar to how the Vault tests work.
Thanks!
Danek