ACLs in Production, Done the Right Way

Hello,

While I have spent some time to understand ACLs, and I have some ideas on how to do this, I am curious what other people are doing for ACLs in production.

In general, I’m used to my clusters coming online mostly on their own. I give them some initial bootstrap parameters, maybe generate some TLS certs, and then let the tools do the work. I’m interested in getting Nomad/Vault using Consul ACLs correctly, and so this means bootstrapping Consul’s ACLs needs to be in this process too.

I’m seeing 3 main problems:

  1. The need to start/stop/restart consul on multiple hosts, one at a time, after configs are updated, and in the right order / not proceeding until the last node is back online.
  2. Capturing tokens from the API to feed into node init on other nodes.
  3. Giving each node a unique token.

In my clusters, vault and nomad depend on Consul, so Consul is first to bootstrap, and first in the dependency chain, however, I do wonder - can we bootstrap Consul agents using tokens from Vault, which has the consul secret backend configured?

It also seems the bootstrap process is sufficiently complex enough that I would guess most deployments use simple and open policies, with mostly manual deployment/configuration, and/or long-lived tokens that aren’t ever reset in the cluster. If anyone has “done it right”, they probably wrote a suite of scripts/tools to do the hardwork for them. Maybe even a bot that understands the Vault/Nomad/Consul agents/configs and APIs to orchestrate ACLs across all of them…

Another question I’ve been wondering about is: to what extent can we tell consul what to use for the initial token bootstrapping? Would it make sense to start the consul cluster with a known Token in a way that reduces the amount of work, and makes it easy to “rotate out” that token used for init?

And yet another question: if Vault has been configured with the Consul secret backend, and can give out tokens to consul, can the Nomad agent use that token, or do we need to give the Nomad agent a more long-lived token?

Maybe you have thoughts, insights, or ideas you would like to share?