Trying to set up vault for secret management in cluster already running nomad an consult

Hi! I’m currently trying to set up a minimal non-dev cluster with nomad, consul and vault (for now it’s a learning cluster so we don’t care about the topology, we’re putting all nomad server/clients, consul and vault nodes in those three instances). I have the nomad and consul configuration working and some dockerized test jobs deployed, and the next step I wanted to try is to have those jobs receive secrets as environment vairables read from vault.

We want to do the minimal setup first before getting into ACL and token roles, basically seeing that we can put a secret in the vault kv and read it back in the nomad job template. So until we see that working we’re ok to just use the vault root token. The thing is I’m having trouble getting all the pieces together work from the system boot, and I’m not sure if I’m missing some steps or misinterpreting the tutorials.

I have this in our task definition

task "postgres" {
      driver = "docker"

      template {
        data = <<EOF
           POSTGRES_PASSWORD = "postgres"
           POSTGRES_USER = "{{with secret "secret/data/postgres"}}{{}}{{end}}"

        destination = "secrets/file.env"
        env         = true

my vault config looks something like this in each node

storage "consul" {
  address = ""
  path    = "vault/"

listener "tcp" {
  address     = ""
  tls_disable = 1

api_addr = ""
cluster_addr = ""

My nomad client and server config both have something like

vault {
  address = "http://vault.service.consul:8200"
  enabled = true

I tried both setting the VAULT_TOKEN to the root token in the nomad.env and as the token field of the vault stanza.

What I have working is:

  1. I unseal vault in the three instances of the cluster. One becomes active the rest standby
  2. I can enable the secrets engine like vault secrets enable -version=2 --path=secret kv and put a key like vault kv put -mount=secret postgres password=something
  3. if at this point I run my nomad job it seems to work.

The problem I’m facing is that this process only seems to work after doing the manual operator unseal of the vault. It’s not clear to me what the flow is for nomad to pick this up on its own, e.g. after a server reboot.

What I see now is that, on reboot, my nomad jobs fail to start because

Get "http://vault.service.consul:8200/v1/secret/data/postgres": dial tcp: lookup vault.service.consul: no such host

and particularly

$ host vault.service.consul 
Host vault.service.consul not found: 3(NXDOMAIN)

This seems to be the case because the consul DNS resolution for vault doesn’t start working until I manually unseal vault. Is there a way for this setup to work without manually unsealing vault? do I have to run some ad hoc script on boot for that? does this problem go away if I setup the token roles instead of trying to use the root token?

or perhaps I’m looking at this the wrong way and the idea is that the cluster will never be all reboot together but rather the HA setup assumes that one Vault standby instance picks up whenever another is down?