How to build images with separation between dev and prod for a build / test / deploy workflow in Vault?

What are some ideas or options to consider in this scenario? Normally I’m used to keeping encrypted values in the repo, so life is a bit different now.

I want to determine how best to setup a vault structure between deployments with a dev, prod_green, and prod_blue type scenario. It seems like the simple build test deploy paradigm breaks down a bit if the values used to produce AMI’s are not in the repository though ( and in vault instead ).

Thats because If we build AMI’s in dev, with vault vars stored in a /dev mount, test them, and release the ami’s into production, what concerns me is that the value’s used to produced the images could be destroyed, making the AMI potentially difficult to reproduce. The values also aren’t tied to the repo commit.

I’m going to share some ideas for possible solutions. One I don’t like:

Alternatively we could

  • Build ami’s in dev, test them. If they pass…
  • Transfer the vars into a /prod_green or /prod_blue mount, build again, and deploy.

This would mean the values used to produce a current green or blue production deployment would be easier to use to reproduce a result, but this is a bit shabby because we are building the ami’s twice to ensure consistency, when we ideally should only do it once where possible. It delays the ability to iterate and release to production.

One idea I do like, with a question:

  • Can vault do snapshots? perhaps we could use a snapshot and replicate all values into a some kind of unique root name? We could commit a vault snapshot of a tree (random pet / hash / id) into a repository. Each snapshot would contain everything needed for both dev and prod green/blue. We might have a structure in the mount as follows:

snapshot-random-pet/dev or base (most vars, even for prod are actually contained here)
snapshot-random-pet/green (some prod vars only, like overrides)
snapshot-random-pet/blue (some prod vars only, like overrides)

I like this approach because it produces version controlled results that can be immutable.

  • when building, all variables would be pulled from the snapshot. When deploying to green or blue, only some of the values in that snapshot would pickup from some other path contained in that snapshot.
  • if dealing with a bug in production, the new dev environment to test would be replicated based on the snapshot used for that commit.
  • some exceptions to immutability might occur, like passwords that may be rotated in the kv store.

I like this idea, hopefully its not too complicated to do in reality. Snapshots could be automatically trashed after some time unless deployed, keeping junk to a minimum, but still, the values aren’t in the repo, over time, a commit hash may not reproduce a deployment. Perhaps it sounds silly, but would there be any value to AES encrypting the value store and keeping it in the repo (as a storage backend?), then restoring it back to the vault to use? It’s a bit like the old ansible encryption workflow I’m used to, but at least vault can manage the keys and would be the only thing decrypting the data.

Another idea, not so good:

  • Any vault values that need to be used for an ami are copied into a seperate mount (eg /dev/random_pet.
  • An AMI packer build can only read from that mount, no where else.
  • Once the ami’s are produced, thay tagged with the random pet / or some id.
    This is just ok, doesn’t provide any guarantees, not a fan of this really.

I’d like to know what others do out there though, its a bit of a hairy problem that it would help me to establish a good workflow early with.

I’m not sure I quite understand the build process you are describing.

I wouldn’t expect the AMI to contain anything that is specific to a particular environment. Those would be added when the EC2 instance starts up. Normally this would be done via a combination of values passed via user_data, and secrets fetched from Vault. A useful tool is the Vault agent which can run at system startup (and optionally other times for periodic secret rotation) to populate config files, etc. with both static & dynamic values from Vault. Authentication can be handled in a number of ways, including App Roles or AWS auth.

It’s true that an AMI probably shouldn’t contain anything specific to a particular environment most of the time, but how to safeguard against it? You cannot avoid that the the entire vault KV state contributed to some output configuration - it might not be able to be restored in the future. Perhaps the path of some variable got changed, and the old one blown away as an example. This should be linked to the repository in some way.

Versioning of individual keys seems to be not very useful with a whole tree, or with git, compared to an older workflow of storing all vars in an AES encrypted file with Ansible, where the version is part of the version controlled repo.

It would be more useful instead of assigning a version per key, to version an entire vault path tree (Wiith a Copy On Write snapshot style), and produce builds with that, or perhaps consider a private git repo to be treated as a storage backend.

This sounds more like general SDLC related questions, which would be handled by good engineering processes. Code reviews, automated tests, documentation.

Vault can’t have a git repository as a storage backend, and it doesn’t really make sense. Vault is a database system, and quite different in usage to a general purpose software version control system. In general version control systems are better for textual content, rather than binary (which Vault data is) and have more complex usages such as multiple branches and merging which don’t fit at all with a standard database design.

In terms of versioning the closes you probably have are things like the S3 storage backend which could point at a bucket with versioning switched on, but this is more useful for backup type processes. Unless you understand Vault internals well the underlying files (which are encrypted) don’t really mean much, and can’t easily be changed piecemeal.

I would suggest for each AMI you need good documentation which includes details of the environment requirements, which amongst any VPC or EBS requirements would also include details of user_data or Vault settings needed.

I would say the big advantage that Vault brings compared to say Ansible Vault within git repositories is using it as the single source of truth for all secrets, making good use of dynamic secret engines as well as static secrets. Equally access control is much easier and more secure. For Ansible Vault you have to distribute a single key, which you quickly lose all visibility of who has access which you can’t can’t audit or easily revoke. Also having secrets in Vault allows more easy decoupling and sharing of secrets - not having a strong link between the underlying code and the environmental parameters can be really useful, either because of organisational setup (different teams owning different secrets) or deployment mechanisms (being able to more easily change orchestration tooling or strategy without having to update application type code).

As you mention however that additional flexibility does introduce new risks. If changes are made to Vault without proper review of existing usage, you can cause breakage. Good levels of access control to Vault, automation of some Vault configurations (generally not the secrets but things like Vault permissions and dynamic secret engine settings), documentation and peer reviews help mitigate much of that risk.