Hi everyone, I recently started a new role and part of this role is to manage an existing but fledgling Vault instance. The instance itself is using the OS version of Vault 1.6.0 and is running on Kubernetes. I’m looking for a little assistance running Vault as a CA.
The cluster is now running well and we have our first application authenticating with it, but this application requires the Vault instance to be a CA and whilst setting up isn’t necessarily the problem (we have the tutorials; Build Your Own Certificate Authority (CA) | Vault - HashiCorp Learn) some of the questions the developers are now asking are a little confusing and I don’t know whether it’s due to my lack of understanding.
They have asked the question; “How do we get the CA certificate into the application, Would you provide it to us securely to inject as a sealed secret to our deployment?” Now, I understood that one of the benefits of using Vault is that can get this using an API call and we won’t need to to provide them with anything. Is that a correct assumption on my part? In fact I presume we use this; PKI - Secrets Engines - HTTP API | Vault by HashiCorp
However, if we generate the root CA in Vault and then managed it outside of vault as suggested here; “In general, we recommend maintaining your root CA outside of Vault and providing Vault a signed intermediate CA.” (PKI - Secrets Engines | Vault by HashiCorp) Then surely that API call won’t work?
Any help or clarification would be greatly accepted.
OK, first, the obligatory comment about the version: Vault 1.6.0 is old. Old enough to be considered unsupported for security and bugfixes, and old enough to not be compatible with Kubernetes service account tokens produced by modern versions of Kubernetes. Please start planning an upgrade. Maybe don’t upgrade to 1.10.x just yet, as there was a change to session handling in the web UI, which produced community outcry and is scheduled to be reverted in 1.10.4, but consider upgrading to 1.9.x ASAP.
There are many ways to structure a CA infrastructure.
In particular, can you share more details about:
Does your company currently have any internal CAs at all yet?
For what kinds of operations does the application “require the Vault instance to be a CA”?
In the question you quote (“How do we get the CA certificate into the application, Would you provide it to us securely to inject as a sealed secret to our deployment?”), the added words “securely” and “sealed” confuse me, because:
The public certificate of a CA, which is commonly required by applications in order to trust other certificates issued by the CA, is not confidential
Communicating even non-confidential configuration via the Kubernetes “secret” resource is quite common
Yet the inclusion of “securely” and “sealed” hints that this isn’t what’s being talked about here.
Regarding fetching this kind of thing via the API:
Even though Vault provides an API, it’s fairly common for developers working in Kubernetes to be interested in automation that talks to the Vault API for them and deposits what they need within the environment or filesystem of their Kubernetes pods, so they don’t have to implement talking to the API themselves.
In this case, if you’re talking about fetching a CA certificate from the Vault API, there may be a chicken-and-egg problem… if you access the Vault API over HTTPS, and need to trust the Vault server certificate to do so, but the Vault server certificate is issued by the CA, which you don’t trust yet because you haven’t talked to the API to get it… cyclic dependency.
As for whether your root certificate should be in Vault or not… a lot depends on just how much you care about security, and how you will manage the evolution of your PKI over the 5 to 20 year timeframe.
Two concerns that are easy to overlook for a new PKI administrator are:
Do you care about revocation? What will you do if your CA certificate or individual certificates issued by your CA are compromised?
How are you going to handle rotation? All certificates, including your CA certificate, have an expiry date. Design and test the operational procedures you will use when that expiry date draws near, lest you find yourself accidentally backed into a corner by previous design choices, when it is time to rotate certificates.
Having a root CA that signs an intermediate issuing CA addresses partially some of those issues:
In the event of an intermediate CA compromise, IF you have all your clients set up to do sufficient revocation checking, it may be sufficient to revoke the compromised intermediate and issue another from the root.
As the root is often kept offline, or subject to very high levels of security, impractical for day to day issuance, it is often tolerated to give it a very long validity period.
Another few Vault-specific things to bear in mind about CAs:
Vault stores all of the certificates it issues in its data store (unless configured no_store)
If you configure no_store you give up the ability to revoke certificates
Applications can easily DoS Vault by requesting thousands of certificates in a tight loop
This is particularly bad because, when using Vault with the default Consul backend, Consul is an in-memory database, not intended to store more than a few hundred megabytes, really
Once an application DoSes Vault in this way, it’s difficult for an administrator to do anything, as there are no supported APIs to remove certificates until after they have expired
Purging of expired certificates from the Vault datastore doesn’t happen at all, unless you set up a periodic process to call the pki/tidy endpoint
Vault currently lacks a way to cleanly replace an existing CA certificate with a new one without downtime for the ability to issue certificates, although improvements are currently being worked on and should arrive in a future version of Vault - maybe 1.11
In short: running a CA is hard. Invest heavily in developing in operational procedures for what happens around compromise, revocation and renewal before you start, or run the risk of being unpleasantly surprised later.
Thanks for your reply, this stuff is really useful. Starting with running the old version of Vault, yes I fully understand that it’s an old version and in parallel with the other things I have to do, I’m looking to upgrade before anything gets to the stage of getting into Production, but thanks for the heads up.
In answer to the questions;
Does your company currently have any internal CAs at all yet?
For what kinds of operations does the application “require the Vault instance to be a CA”?
I’ve been through the organisation and I believe that we do not have any internal CAs. We’ve never needed to generate them before, we only have the need now. Therefore we also do not need to use Vault to generate the CA, it was merely something that I knew Vault could do and I thought it would be better to Vault to generate and manage the CA, than generate and manage it externally and import it into Vault, but I guess that it’s a case of just because it can, doesn’t mean it should?
These questions…
Do you care about revocation? What will you do if your CA certificate or individual certificates issued by your CA are compromised?
How are you going to handle rotation? All certificates, including your CA certificate, have an expiry date. Design and test the operational procedures you will use when that expiry date draws near, lest you find yourself accidentally backed into a corner by previous design choices, when it is time to rotate certificates.
…Are also extremely useful. In answer to the first question, I guess I would be stupid if I said we didn’t care about revocation and for the second question, well I guess that as I had thought initially that Vault could do the rotation, but if we’re going to create and manage the CA elsewhere and import it into Vault… then indeed, we are going to have to design something to handle this.
This is just food for thought here and doesn’t really try to add to your existing goal but may help.
Almost everything is Vault is wrapped to be a short term dynamic solution. You want access to a database to read something, here is a set of credentials that you can use for the next 5 minutes, etc. Although you can setup any of the engines and roles to provide long term access, they were originally setup with short lived access. PKI is no different. You can generate long term certs (note that I’m not saying the CA term here) but you’re better off creating short term certs that do what they need quickly and programmatically then go away, rather than generate and manage long lived certs externally via whatever process you’re thinking of.
OK, if you’re introducing internal CAs to an organisation that doesn’t currently use them, I think a good starting point is to be asking yourself (and writing down, as a written ADR - architecture decision record):
Which processes need to trust certificates issued by the new CA?
This will be informed by the answer to my question:
which you didn’t really answer yet.
Which processes need to trust the CA, will tell you where you need to configure CA trust stores so that your CA is actually trusted there.
It’ll also let you know whether you need to consider issues like: If a rogue employee gained access to the CA or its private key, could they use it as part of an attack to impersonate either internal or external systems - e.g. to mount a man-in-the-middle attack to capture credentials.
@aram makes a good point here about lifetimes - whist Vault is capable of issuing more traditional certificates with longer lifetimes, a lot about the design of Vault is based arouand short-lived credentials. Also, make your certificate lifetimes short enough (whatever that means for your security tradeoffs), and you can just not care about revocation, as bad certificates can be assumed to expire “soon enough”.
You’re right I didn’t answer that question, so I’ll address that now as much as I can.
For the particular needs of our business we are not using Vault as a secrets management tool/credential store in a corporate/enterprise way. We are using it specifically as one element in an in house developed system, which is part of our main product. Having said that, in our particular instance here, we use Vault for two things…
We are using Vault to store some credentials from our customers (Fairly basic functionality which doesn’t require Vault to be a CA).
We need to issue short-lived certificates for mutual TLS, which is what we need Vault to act as a CA (or at least an intermediate CA) for.
I hope that gives a little more information to help?
The key piece of information here, is that the things that need to trust the new CA, are the things on both ends of the mutual TLS connections.
So, you need to make a list of what these things are, so you can consider how you distribute the CA certificate to them all, to create the trust relationship.
Is it just all the instances of a particular group of deployed services?
What about developers performing debugging tasks, do they need an mTLS identity too?
How hard or easy that distribution process is, will steer your decision-making on several other aspects of the design.
Especially, how rapidly can you push changes to the trusted CA certificates across the entire system?
If you can do so relatively quickly, all concerns about how to renew your CA go away - you just create another one, push it out to all of your trust configurations alongside the old one, and gradually phase out the old.
Similarly, revocation becomes much less interesting, as in the event of a security breach, you then have the option of distrusting your current CA and replacing it with a new one, to cure the breach.
You may even decide that you don’t need to care about having a root CA and an intermediate CA - it’s worth noticing that the root-and-intermediate setup commonly practiced for Internet CAs, is mostly a way of dealing with the fact it’s not realistically possible to update the list of trusted CAs in every web browser (and other TLS client) installation across the globe, other than over the timescale of years.
After speaking to our architect, I have more answers.
We are going to use an external root CA/Intermediate CA format and import the Intermediate CA into vault.
The mTLS connection will be between one internal app (which we have full control of) and Vault. Nothing has been mentioned about Developers debugging tasks, so I’m going to explicitly exclude this.
The certs will only last for 24 hours so I’ve been told that I don’t need to consider revocation.
This then takes us back to the original question the developer had; “How do we get the CA certificate into the application, Would you provide it to us securely to inject as a sealed secret to our deployment?”
Now that I understand using mTLS essentially means we will have to manage the CA in two locations and there is no way Vault can manage this for us. Vault will have an imported CA stored, and the same CA will have to be available to the app (probably stored as a secret in Kubernetes.)
Does that sound like I’ve grasped the problem or not?
Wait, you want to use mTLS with Vault? Isn’t that a bit of a circular dependency? As in, you would need your mTLS identity certificate to authenticate to Vault in order to fetch your mTLS identity certificate?
Even supposing that you overcome this somehow, it’s not really full mTLS, as I strongly suspect Vault will only actually care about the certificate when it comes to the login request, returning a Vault token, and for other requests thereafter you probably won’t actually have your certificate being checked - only the token.
In any PKI infrastructure, you always have to distribute a root of trust to your clients.
If you are using a root CA with an intermediate CA, the clients should be pre-configured (e.g. via a Kubernetes secret, indeed) with the root CA’s certificate.
Vault would be configured with an intermediate CA certificate and private key and would return the intermediate CA certificate to clients when issuing their individual certificates - it is then the individual clients’ task to save and re-present the intermediate CA certificate as part of their certificate chain, when they use their certificate, to prove the binding between root and intermediate. This decoupling is how it becomes possible to move to a replacement intermediate CA without service interruption.