Does container image creation belong in IaC?

I’m interested in understanding best practices around managing deployments of a containerized application with IaC (i.e. Terraform).

To give some context, my application consists of containerized microservices running on ECS and a shared RDS cluster. The Docker images are stored in a private ECR repository.

I think that it’s intuitive to include Docker image creation/pushing within IaC, but after a quick search online, this opinion doesn’t seem to be shared. My thinking is that, when we’re deploying from scratch (i.e. new AWS account or new dev branch), a single “terraform apply” should provision an ECR repository, create and push Docker images, then provision ECS services referencing the newly-created images.

The alternative approach (which most people online seem to recommend) is leaving Docker image creation/pushing to a separate CI/CD pipeline. To me, this seems like it will slow down the whole process. We would need to do a one-off deployment of just the ECR repositories, then trigger CI/CD to create and push Docker images, then go back and provision the rest of the infrastructure (i.e. ECS).

Am I looking at this from the wrong angle? Any thoughts on best practices for such a scenario would be very insightful!

Hi @mo-lukecarr,

Your observations so far have focused on what happens when creating the infrastructure from nothing, and it’s true that the model you’re critiquing does add an extra step in that case.

Typical arguments in favor of a separate image building process are typically concerned more with ongoing maintenance of the system, because in most cases we update a system many more times than we create it from scratch, and so it can be worth some extra complexity in initial bringup to get more maintenance flexibility later.

So what is that flexibility? The main concern I think about in this area is what I would do if the newly-built image is somehow defective. In that case I’ve typically enjoyed the ability to quickly switch back to the previous known-good image so that I can debug the new problem without so much time pressure. The best way I’ve found to achieve that is for the Terraform configuration to expect an image to already exist, and to fetch the id/location of that image from a configuration store I can modify when needed.

In the normal case of rolling forward to a new image, I have the image build process update that configuration store automatically when the build is successful and then I run Terraform to react to that change by switching to the new image.

If I need to temporarily roll back, I change the configuration store back to the previous good value and run Terraform to switch back to the old image until I’m ready to try again.

For me the flexibility outweighed the cost, but of course you are free to prioritize things differently if your situation is different.

I don’t intend the above to say that I’m right and you are wrong; as with most things in system architecture, it’s a bunch of tradeoffs and you need to decide what is most important to you and design accordingly. Building images in Terraform is not wrong, but it has some drawbacks that some folks (including me) decided were not appropriate for our systems.

(There are probably other reasons for and against, beyond what I’ve shared here. This was just the concern most on my mind when I was making a similar decision.)

1 Like

Really appreciate this length reply, @apparentlymart ! :grin:

I think this is the perspective I’ve been missing, so thank you for offering your thoughts.

If you don’t mind, could I pick your brain over some of the specifics? I realise that we may not be using the same cloud tech, but I’ll try and keep my questions cloud-agnostic!

  • You might already have answered this by mentioning a “configuration store”, but are you hardcoding image tags/versions in Terraform, or are you fetching this from some remote config (Parameter Store in AWS, for example)?
  • From your reply, I think one of the things I overlooked was having a single, central container image registry. Right now, we’ve been looking at having a registry per-branch/stack, which I think made us then assume that image creation needed to be part of the IaC process. Am I right in thinking that, if the approach is to move it out of the IaC process like you suggested, this would lend to having one central private container registry (which, because it’s central, means it doesn’t need to be created/managed by the actual application’s Terraform)?
  • Do you use/have any thoughts on using “latest” or other tags/versions where the actual underlying image can change? This sort of ties into my question about hardcoding tags in Terraform.

Thank you in advance! It’s a really useful conversation to inform my decision making!

To be explicit (sorry I wasn’t the first time), I was describing some tradeoffs I made in my previous job, rather than a system I currently maintain, since currently I work on Terraform at HashiCorp and so I’m not directly maintaining infrastructure. I was being general in my answer because indeed the system I was working on had some different details. In particular, I was dealing with AMIs in Amazon EC2 rather than with Docker images, but I don’t think that difference really matters for the tradeoff I was describing.

But with that clarification out of the way, I’ll try to answer your followup questions:

  • Indeed, when I said “configuration store” I was meaning things like AWS SSM Parameter Store. This can really be anything that lets you store a small string and then retrieve it using a data source in a Terraform provider, but AWS’s solution and its equivalents in other platforms are the most obvious choice today, I think.

    I was going to say more here but I think what I was going to say is the answer to your third question, so I’ll answer that below instead. :grin:

  • Since I wasn’t working with Docker I don’t think I can really comment on specifics such as whether you should have one or multiple Docker image registries. However, thinking more generally I do think of the system that stores the images as something separate from the system the images are used for.

    This for me is actually a separate tradeoff than the one we were originally discussing: I like to make sure that the infrastructure that Terraform uses is separated from the infrastructure that Terraform manages, so that I can maintain the two independently and in particular there’s little risk that a problem with the infrastructure Terraform manages will prevent me from running Terraform to fix it.

    AMIs don’t have a “registry” concept and so that specific question didn’t arise for me, but to me the image registry belongs to the same collection of things as the CI system running the build pipelines, the software I’m using to automate running Terraform, etc. For me those things are in a separate space than the “real” infrastructure. (I did actually also manage them with Terraform, but did that outside of the Terraform automation so that I wouldn’t be risking making changes to the very infrastructure that my Terraform process was running on.)

  • In a system like Docker which has mutable tags, you could indeed use a Docker tag in place of the explicit “configuration store” I mentioned. I think of Docker tags as being a highly specialized configuration store that can only store one type of value: a docker image id.

    I think it would be very reasonable in your case to decide that a specific tag in your repository represents “the current image” and then implement the rollback scenario I described by modifying that tag to point back to an older image.

    One thing to keep in mind is that you will need some way to know what the “last known good” image is if you want to follow something like the process I described. I don’t know if Docker registries tend to track the historical values for tags or if you’d need to invent your own way to do that, but as long as you have a record somewhere of what was previously selected (and so, can document how someone should find it when they need it) then that’s good enough.

1 Like

I don’t really have anything more to share: I agree with all of your points, and just wanted to say thanks for another super in-depth answer.

I think this topic has given me a better understanding of the problem space, and I’ve definitely got a clearer picture of how I can be using IaC tools to their fullest.

While I appreciate the modesty in reasoning that there’s no right or wrong approach, there definitely are some approaches that have more merit than others! Thanks once again for offering your thoughts, definitely got something to take away and work on now. :slight_smile:

Really like this abstraction and way of thinking: I’m definitely going to steal this one!

Why do you believe integrating Docker image creation/pushing into Terraform is intuitive, despite differing opinions online?