Vault for job queues

I’m tasked with setting up a job queue system, for which I’m evaluating some of the most common options like celery, rq, tasktiger, etc.
The thing is: a worker, when it receives a new job to execute, needs to fetch a secret from vault, which it needs to perform its task. The worker can then carry out its task and no further access to vault is needed.

My question is about which of the various vault authentication methods is most suitable for this scenario. I think approle could fit (though I’m still unsure about the exact dynamics), but I’m just starting out with vault so I’m still uncertain. Would I configure the role id in the worker, then send it (how?) the secret id along with the job to run, so it can authenticate, get a token and read the secret it needs?

If someone has advice or pointers to examples, it would be very welcome.

thanks!

There are a few ways this can be done, but it depends a lot on who should access what. For example:

if you trust the worker - it can have access to all the secrets, then you only need to solve secret introduction for the worker. In this case, the job would not directly dictate the secret, but implies it. the job says “read from database”, and the worker reads the passwords from vault and connects to the DB. In this case, only the worker, ultimately needs the token.

At the other end of the spectrum, if the job directly dictates the secret to use - the job submitter may be the one generating the access token and passing it with the job queue. Then the worker “blindly” accesses the secret via the data in the job.

It is very dependent on how you scope your secrets.

The following article, while specific on AppRoles, also has a lot of information on secrets introduction - basically, how to access a secret without using another secret.

Once you get through that, try and map out all “identities” ( people, machines and apps) in your system ( you have worker, jobs and queue admin so far ), and what they do in the system.

Thanks. Now I realize I haven’t provided enough detail, so here is more information.
To simplify, here a secret consists of credentials belonging to a third-party, which provide access to some service; the third party provides the secrets to our system so we can access the service periodically on their behalf.
These secrets are stored in vault by another component of the system which I don’t have access to; for the purpose of this discussion, we can just assume that secrets are already there under a known path (eg ‘secret/customerX/serviceY’).

So here we come back to the original question: there is a job scheduler which enqueues jobs for workers to be consumed. Each job refers to a customer and service, so the vault path for the secret it requires can be programmatically computed.
While I do trust the workers, I’d still like to give them the minimum access possible to vault, both temporally (once the worker has the secret, I don’t want it to retain access to vault, since it doesn’t need it) and path-wise (it should access only the specific secret it needs, not others).

So I was thinking of something along these lines (let’s assume we’re talking about workers for customerX-serviceY):

  • create a policy that only gives read access to secret/customerX/serviceX
  • create a role with a token and secret_id max uses set to 1
  • include the role id in the worker code
  • when a new job is scheduled, have the scheduler generate a secret id for that role (using a policy that only allows that) and send it to the worker along with the actual parameters for the job
  • when the worker receives the job, first thing it does is it uses the received secret id + the role id that it already knows to authenticate to vault and get a (single-use) token; use the token to read the secret it needs.
  • perform the job.

One thing that I don’t like about the above workflow is that a job might not be picked up immediately by a worker and thus might remain queued (along with its data, including the secret id) in rabbit, redis etc for an unknown period of time. So I was thinking that I could have the scheduler not bother with vault and instead set up a simple secret id generation service which the worker can call when it actually starts working, so the secret id is generated and consumed just at the right time. Alternatively, the scheduler could use wrapping for the secret id so even if it’s stored for some time in the queues, the actual secret id is not exposed.

But again, any further suggestion is welcome, as are any criticism of the above plan.

So you have “done your homework” :slight_smile:

Wrapping would be required if the secretID stays in queue. And from the documentation there are ways of verifying that it has not been tempered with.

Instead of setting up the “simple secret id” generation end point ( how do you secure that ? secret zero again ), instead of the scheduler passing along the wrapped secret ID, have it wrap a token to generate the secret ID? The worker would use that to generate it own secretID (so it would also need to know its role, not just the rollID) - you can then shorten the SecretID TTL.

To me, it makes more sense than what you are suggesting for a service, but I’m not certain - I’m just thinking “fast” to answer and give ideas. There’s the whole child token, expiration, wrapping and such. As well, would it be ok for the worker to be able to generate any secretID, since it would get an auth error as it had the wrong RoleID ? Or would you limit the SecretID generation ?

I like to use AppRoles with policy templates, and my secret ID generators are pipelines that use JWT auth - so I can use metadata and policy templates to also limit the secretID generation.

Well, after reading the approle docs a few times more, it looks like there should always be a “trusted entity” (orchestrator, CI or whatever) which is in charge of generating and injecting the secret id (or the wrapped response for it) into the client, so I suppose the “simple service” I was envisioning could fulfill that role…obviously it would only have the ability to create wrapped responses for secret ids and nothing else, so I think (corrections welcome) that it could be assumed that it would be configured with a static token (after all it’s a trusted entity).

This would allow the worker to get the secret id in a JIT manner by asking the “simple service” described above and unwrapping its response; no vault data (either secret ids or wrapped responses) would ever need to sit in the queues, waiting for a worker to pick it up.

I still need to give the whole thing more thought though.

Thanks!

It does get a little loopy when you think too fast about this. If your simple service is the one injecting into the worker - then I think it’s fine. If the worker requests from the simple service ( as you describe it the first time), then you need to authenticate the worker.

By making the worker do the request, you invert the trust relationship to the worker, not the simple service, and if I must authenticate the worker, I might as well do it with vault instead o the simple service.

This is why I mentionned passing permissions (tokens) from the scheduler over the queue to the worker to perform the JIT vault request.

For me, when I do a walk through, it helps to make it physical and think of it as people, keys, storage lockers and a lot of bureaucracy.

Hm, I had not thought about this inversion of trust, which is indeed relevant as you say. Let me iterate a bit more over this.