Terraform multiproject scenario good practicies?

Hello

I have such “logical” problem.

We have lets say around 5 pipelines which are building application, creating images and push it to registry.

We have separate repository with terraform configs which is building our all environment. Setting up GKE and dependent services next on top of it it is configuring with helm provider our application stack based on .tfvar config file.

Our .tfvar files are storing images configuration for all heml charts. When new image is build for now we are adding it manually to those configs but we want to automate it. When eg few application pipelines will build image in same time, we need to update our tfvars files with all newly created images. Even if we execute terraform pipeline from application pipeline we can update only one image and it need to be written down somehow because application pipelines are not aware one of another.

So the problem is how to connect all of it together?

I was thinking about creating separate repository for tfvars files and add it as git submodule to each git repository and with one pipeline job update it and commit with new image versions but i feel is more like workaround than proper solution.

What are good practicies for such scenario? Maby there is some option for terraform (or terragrunt?) to include tfvars from other repository or any other better more “native” solution?

Hi @robert-ingeniousio,

The details of this tend to vary depending on what you’re using to orchestrate your pipeline, but the general idea I’ve seen many times is to have early pipeline steps publish their results somewhere that the later pipeline steps can retrieve them and then use that to create the necessary dataflow through the steps.

Some automation systems have an explicit way to attach metadata or files to a job and to retrieve it downstream. If yours does then I’d suggest starting with that, because it’d then give the best visibility via that system’s UI as to how the data is flowing.

For systems that don’t have such an explicit mechanism, you can often “fake it” by e.g. making the build job publish the latest image location in some well-known location (e.g. in a key/value store with a known key) and the deploy job then fetch that value. If your deploy step is running Terraform then indeed it will likely pass the value it retrieved into the Terraform configuration as an input variable, using one of the various mechanisms.

Another thing to consider here are what the “rollback” approach will be, if any. If you’ve taken the approach of publishing the latest image location somewhere then one way to roll back would be to reset that back to the old image and then run only the “deploy” step. Some automation systems have more prescriptive answers to this, particularly if they have an explicit idea of build vs. deploy rather than just treating all pipeline steps as generic scripts to run.

With all of that said, Terraform itself doesn’t include a solution to this because it’s outside of Terraform’s scope. Generally we expect that pipeline/automation tools are the better layer to handle this sort of connectivity, and so Terraform is intended to be a thing that the automation runs rather than the automation itself.

Thank you for reply.

Im using GitLab CICD, there is possibility to pass variables or artifacts to “downstream” pipeline and it is ok if only one pipeline is passing its images to Terraform pipeline. But my problem is i have 5 and more applications pipelines and every have separated CICD which is build in different time and with triggering “downstream” pipeline it can only pass its own last builded docker image, its not aware there are any others.

In that case i understand i should add just step which will be updating images in some external database from where CI/CD will be getting it to build terraform config or maybe that git submodule which will be storing it in some file will be enough?

Yes, in your case it seems like the requirement would be for each of the build steps to publish their results somewhere and then have your deploy step, prior to running Terraform, fetch the values from that same location and pass them in to Terraform as one or more input variables.

The key result of that design is that this external data store will “remember” the most recent result from each of the jobs, and so you can run the deployment job at any time and it will, if nothing changed upstream, just pass the same values to Terraform again and then ideally the relevant provider will notice that nothing changed and so Terraform will propose no changes. If you rebuild one particular image then the Terraform plan should only include the changes related to that one image, because Terraform can see that all of the other images are the same as recorded in the previous state snapshot.

I’m not familiar with GitLab CI/CD in particular, but I did quickly refer to its documentation and it seems to use terminology I’m familiar with, so hopefully I understood correctly how it works and so the following would make sense:

I think I would try to model each of your build steps as a separate “pipeline” in GitLab, and then represent the single multi-image deployment as its own separate “pipeline”. I think that means you could use Multi-project pipelines to configure it so that if any of the build pipelines run they will each trigger a run of the same downstream deployment pipeline.

I think in your comment you were referring to the fact that when you have one pipeline trigger another it can pass variables down to the downstream pipeline but it can only pass its own data in, and so as you say by that strategy there wouldn’t be any way for the deployment project to find the images from the other pipelines. I think I would try to address that by taking a pull rather than a push strategy: you mentioned that you’re pushing images to a registry, which suggests that you could design your build pipeline to directly access the registry to find the current/latest image for each component and then pass those ids in to Terraform as one or more variables.

This is, therefore, using the package registry as the “external data store” in what I described: the build steps writes to it and the deploy step reads from it. If we’re talking about Docker container images, I’d think about using a specific mutable tag like latest to represent the “current version” of each image, and then if you find you need to roll back to an earlier version of the image then you can use some other process to change latest to refer back to an existing image and then re-run the deployment pipeline without re-running any of the build pipelines.

I hope that’s useful! I don’t think I’d be able to go into any more detail on this because I’m already well past my limit of knowledge about GitLab, but I hope I at least got enough of the terminology right here that you can see what I’m talking about and think about how to adapt it into a real solution using the GitLab building blocks.

Thank You for your detailed description will check that idea how to tag images properly with version of application and “latest” tag. As we need to have all images taged by application version i think i will proceed with my idea about separate repository which will be submodule for other git repositories and where newest images versions will be stored and updated. Thank You again for helping and sharing ideas