TF Cloud / CLI-driven .terraformignore question

Hi,

I believe everything is now set up to use TF Cloud in CLI-driven mode, however on terraform plan it takes a long time to do anything. I believe this is because it’s uploading the contents of .terraform and .git directories because of an issue with .terraformignore. Either I have it in the wrong place, or I am using the wrong slashes.

An issue was reported on github here, however, I am not clear what was fixed.
a) .terraformignore supports / on Windows
b) Terraform supports \

I have tried both forward and back slashes. At the end of the day, as this is running on TF Cloud, it’s running on Linux, so I would think that / make the most sense. .gitignore doesn’t care.

My local file structure looks like:

Root
    │   .terraformignore
    │
    └───PRODUCTS
        ├───arc.product.az
        │   │   .gitignore
        │   │   README.md
        │   │
        │   ├───.git
        │   └───src
        │       │   .terraform.lock.hcl
        │       │   data_resources.auto.tfvars.json
        │       │   product_main.tf
        │       │   product_outputs.tf
        │       │   product_variables.tf
        │       │   tfcloud_integration.tf
        │       │
        │       └───.terraform
        │           │   environment
        │           │   terraform.tfstate
        │           │
        │           ├───modules
        │           │       modules.json
        │           │
        │           └───providers
        │               └───registry.terraform.io
        │                   └───hashicorp
        │                       └───azurerm
        │                           └───2.99.0
        │                               └───windows_amd64
        │                                       terraform-provider-azurerm_v2.99.0_x5.exe
        │
        ├───terraform-azurerm-resource_group
        │   │   main.tf
        │   │   outputs.tf
        │   │   README.md
        │   │   variables.tf
        │   │   versions.tf
        │   │
        │   └───.git
        ├───terraform-azurerm-virtual_network
        │   │   .gitignore
        │   │   main.tf
        │   │   outputs.tf
        │   │   README.md
        │   │   variables.tf
        │   │   versions.tf
        │   │
        │   └───.git
        ├───terraform-meta-namer
        │   │   .gitignore
        │   │   azure-pipelines.yml
        │   │   outputs.tf
        │   │   README.md
        │   │   variables.tf
        │   │   versions.tf
        │   │
        │   └───.git
        ├───terraform-meta-tagger
        │   │   .gitignore
        │   │   outputs.tf
        │   │   README.md
        │   │   variables.tf
        │   │   versions.tf
        │   │
        │   └───.git
        └───terraform-model-az_landing_zone
            │   .gitignore
            │   main.tf
            │   outputs.tf
            │   README.md
            │   variables.tf
            │   versions.tf
            │
            └───.git

Each module is in its own repo.

When running terraform plan, the output seems to suggest it wants .terraformignore in the Root directory (which is odd as it’s outside source control). I have tried it under PRODUCTS too.

PS C:\Root\PRODUCTS\arc.product.az\src> terraform plan
Running plan in Terraform Cloud. The output will stream here. Pressing Ctrl-C
will stop streaming the logs, but will not stop the plan from running remotely.

Preparing the remote plan...

The remote workspace is configured to work with the configuration at
PRODUCTS/arc.product.az/src relative to the target repository.

Terraform will upload the contents of the following directory,
excluding files or directories as defined by a .terraformignore file
at C:\Root/.terraformignore (if it is present),
in order to capture the filesystem context the remote workspace expects:
    C:\Root\

My workspace’s Terraform Working Directory is set to: PRODUCTS/arc.product.az/src.

This is what my .terraformignore looks like:

# Git
**/.git/**
# Local .terraform directories
**/.terraform/*
!**/.terraform/modules/**

# .tfstate files
*.tfstate
*.tfstate.*

# Crash log files
crash.log
crash.*.log

# Exclude all .tfvars files, which are likely to contain sensitive data, such as
# password, private keys, and other secrets. These should not be part of version 
# control as they are data points that are potentially sensitive and subject 
# to change depending on the environment.
#*.tfvars
#*.tfvars.json

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Include override files you do wish to add to version control using negated pattern
# !example_override.tf

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*

# Ignore CLI configuration files
.terraformrc
terraform.rc

The plan does eventually work. It just takes about 10 minutes - which isn’t much use.

If anyone has any ideas, I’d be grateful if they could share.

Terraform documentation link.

Thanks

W.

Windows
TF v1.1.7

Still hoping for any ideas on .terraformignore

I’ve done a little test. I’ve copied the above structure to C:\Temp and deleted all the .git directories. Running terraform plan is much, much faster. A few seconds.

To me this suggests .terraformignore is being ignored.

I’ve been thinking about what is happening with the CLI-driven approach. In my .terraformignore file, I’ve specified **/.terraform/*. This directory holds a bunch of stuff but I’ll pick on the providers (e.g. azurerm).

In the CLI-driven approach, what role does TF Cloud play? My understanding is that it:

  1. Executes the code against the target environment.
  2. Stores some variables (i.e. env vars for access to cloud environments).
  3. Stores the remote state.

Therefore, why are the providers being downloaded to my local machine. Surely, they’re not needed here as the execution is done in TF Cloud? Therefore, when specifiying TF Cloud, shouldn’t Terraform be smart enough not to download the providers to the development machine and if they are still required, then shouldn’t the TF Cloud upload automatically ignore them?

Maybe this is a difference between CLI-driven and VCS-driven runs.

Incidently, is there any way to view what has been uploaded to TF Cloud? Some secret API call perhaps?

Thanks

W.

I’ve done some more testing. I had a look inside the .git directories and as these are fairly young repos there aren’t many files. In older repos, the .git directory can have thousands of files and although small, read/write times can be massive, however, I’m talking about 50 1KB files. This shouldn’t take ~10 minutes to upload.

As I said, I’d tested terraform plan against a copy of the code in C:\Temp\tf-test\root... which worked as expect. A few seconds.

I create a 400Mb file and re-ran the plan. Again, no noticeable delay. Mmmm…

I wanted to know what was causing the apparent slow-down and I think I’ve found it.

My code’s root is : C:\Users\woter\source\Root\PRODUCTS\ and the TF code is in the directories below that (as shown above).

In the Root directory, I have loads of files for different projects that have nothing to do with Terraform (7GB).

Terraform Working Directory is set to Root/arc.product.az/src.

Reading the output of terraform plan:

Preparing the remote plan...

The remote workspace is configured to work with configuration at
Root/arc.product.az/src relative to the target repository.

Terraform will upload the contents of the following directory,
excluding files or directories as defined by a .terraformignore file
at C:\Users\woter\source\Root\PRODUCTS/.terraformignore (if it is present),
in order to capture the filesystem context the remote workspace expects:
    C:\Users\woter\source\Root\PRODUCTS

What Terraform must be doing is uploading the whole of C:\Users\woter\source\Root. At 7GB, no wonder it takes so long!

Solution 1

I have added a directory so the parent only has relevant files:

Terraform Working Directory: Azure/arc.product.az/src.

The absolute path is now: C:\Users\woter\source\Root\PRODUCTS\Azure, which contains only the Terraform files at 175MB. Still large, but that because of the contents of .terraform.

Solution 2

Modify .terraformignore to ignore everything in C:\Users\woter\source\Root. This is much faster, but not as fast as solution 2. I guess it’s enumerating the files to ignore. Possibly the .terraformignore file can be refined. It does save having to mess around relocating git repos though.

The .terraformignore file is located in C:\Users\woter\source\Root and looks like:

#Initially ignore everything
*

# Don't ignore directories, so we can recurse into them
!*/

# Don't ignore .terraformignore
!.terraformignore

# Don't ignore config files:
!*.tfvars
!*.tfvars.json
!*.tf

# Git
**/.git/**
# Local .terraform directories
**/.terraform/*
!**/.terraform/modules/**

# .tfstate files
*.tfstate
*.tfstate.*

# Crash log files
crash.log
crash.*.log

# Exclude all .tfvars files, which are likely to contain sensitive data, such as
# password, private keys, and other secrets. These should not be part of version 
# control as they are data points which are potentially sensitive and subject 
# to change depending on the environment.
#*.tfvars
#*.tfvars.json

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Include override files you do wish to add to version control using negated pattern
# !example_override.tf

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*

# Ignore CLI configuration files
.terraformrc
terraform.rc

Personally, I don’t see why Terraform has to nose around in the parent directory of Terraform Working Directory. I would have thought I could set Terraform Working Directory to PRODUCTS and it would look inside, not in the parent. .terrafromignore should be under change control and having to put it in Root means it’s not.

Anyway, problem solved, although I’d be delighted if someone can chip in and school me on what’s going wrong. I’m sure I’m not the first to encounter this.

Thanks.

Hi @woter1832,

I think the surprising part of what you described for me is that you have the “working directory” of your remote workspace set in such a way that it is relative to a directory that isn’t under version control. That suggests to me that perhaps there’s a solution in choosing a different working directory, but I’m not sure so I will describe what that setting is intended to represent and then hopefully you can decide if the way you are using it matches that meaning.

Firstly, that setting is intended primarily for those who are using Terraform Cloud in a way where it runs automatically in response to changes in a version control system. In particular, for situations where the version control repository has the intended Terraform configuration in a subdirectory rather than in the root, in which case Terraform Cloud would check out the entire tree at the head of the selected repository, cd into the designated directory, and run Terraform CLI in there.

When using a mixture of VCS-driven and CLI-driven workflow (typically: using “terraform plan” to preview the effect of a change before opening a PR) Terraform CLI must upload the same repository contents that would eventually be in the VCS repository, which leads the the behavior you saw of uploading additional directories above the working directory to ensure that the speculative plan execution environment is a realistic approximation of what a VCS-triggered plan would see.

If you aren’t using Terraform Cloud’s version control integration then you may not need to set the “working directory” at all. The only reason to set it in that case would be if your configuration refers to other files outside of its root module prefix, such as a module call with source = "../modules/example". If you do have references to outside files then you can set the working directory to be relative to whatever directory you imagine as the “root” of what that particular configuration will refer to. The fewer levels of subdirectory you include in the working directory, the fewer levels of parent directory Terraform CLI will need to upload in order to create the full execution context.

In particular, I would not typically expect to see the working directory be set relative to a root that’s outside of the VCS repository you’re working with, since that suggests a directory layout that Terraform Cloud’s VCS mode could not support. You can see it up that way in principle if you are not using the VCS integration mode, but then the effect will be what you saw: Terraform will understand it as a requirement to upload the entire directory structure, so that Terraform code can refer to other directories under that root.

Thanks @apparentlymart. I knew you’d reply. :slight_smile:

To be clear in my head about your comment “When using a mixture of VCS-driven and CLI-driven workflow”, when setting up a workspace I have the option of VCS-driven or CLI-driven. If I try to configures VCS on a CLI-driven workspace, I get:


No option to select VCS as I would in a VCS-driven workspace.

I have modules in the registry that are associated with version control. Is this what you mean by a mix of VCS-driven & CLI-driven, the actual “configuration” that consumes the modules? The configuration is under (github) source control, but there is no option to have VCS controlled CLI-driven workspace, therefore TF Cloud doesn’t know anything about it.

The file structure is the same as if I was using Terraform CLI without TF Cloud. It appears TF Cloud is acting as the “runner” and backend state storage, which saves a large amount of setup.

My understanding of Terraform’ best practice is to have one module per repo and this is certainly promoted by TF Cloud. Having to follow your naming convention is slightly annoying, but one I can live with. Therefore, my file structure looks like:
image
1 = configuration
2 - 5 = modules (there are many more).
6 = model which is basically a module of modules.

Each one of these directories is a Git repository. Another example of a model might be terraform-model-compute or terraform-model-web which contains a collection of modules to deploy compute or websites, respectivly.

I believed that lumping modules under one configuration, in one repo, was not recommended, mainly because the modules should be reusable. This was the way I did it around version 0.11.

I think my setup is following what you describe in paragraph 3. I call terraform from inside arc.product\az\src and I set the source relative to arc.product.az\src so in the model, I have source = ..\terraform-azurerm-resource_group.

I will say that my code was written based on pre 1.0 versions, so I could probably refactor now that we can iterate at the module level.

Each module directory used to have an src directory to separate .git, .gitignore, readme.md etc from the the *.tf, *.tfvars files, however, this didn’t work with TF Cloud.

Considering the structure, I see no place for .terraformignore as it has to sit above these directories, which is not source controlled. How would you suggest I structure the directories?

The other thing that surprises me is that I need to have these module on my machine (apart from development). My expectation would be:

  1. On a commit to a module, the change is pushed to the TF Cloud’s registry.
  2. On a terraform plan / terraform apply (on local machine) the TF Cloud “runner” acquires the modules from the registry specified in the configuration.
  3. The plan is applied to the target environment using connection variables set in TF Cloud.

With this thought in mind, I changed the source of the modules to point to app.terraform.io/{some_workspace}/resource_group/azurerm as opposed to ../terraform-azurerm-resource_group, however this doesn’t work with the CLI-driven workspace; which needs the modules to be local. Maybe what I am looking for is what is offered by the API-driven workspace (something I am yet to try).

Thanks again.

Hi @woter1832,

There are a few different decision points here that you are discussing all at once, but they are actually mostly independent of one another (though admittedly not entirely) so I think it would help to visit each one separately.

CLI-driven vs. VCS-driven workflow

As you saw, Terraform’s UI for activating a VCS-driven workflow presents it as mutually-exclusive with the CLI-driven workflow.

The reason it does so is that if you enable the VCS-driven workflow then Terraform Cloud will reject attempts to run the terraform apply command as a remote operation, because in VCS-driven mode it is new commits on the designated VCS branch that indirectly create applyable plans for you to approve.

The subtlety that the decision UI doesn’t mention is that in VCS-driven mode you can still use the terraform plan CLI command to create speculative plans, which are plans created for human review only, that cannot ever be applied.

A typical reason to do that is to shorten the development feedback loop when working on a change by previewing the likely effect of a change before pushing the work in progress to the version control system.

In that case, the terraform plan CLI command needs to recreate a compatible filesystem structure to what the VCS import would’ve produced so that the plan can be realistic. In order to achieve that, it uses the “Working Directory” configured in Terraform Cloud to understand where the root of the VCS repository ought to be relative to the configuration directory where you ran terraform plan, traversing upwards in the directory tree once for each path segment of the configured Working Directory, and then uploads the full contents of whatever ancestor directory that selects, paying attention to any .terraformignore file that might be present in that directory.

“Working Directory” when not in VCS workflow mode

If you aren’t using the VCS-driven workflow then the “Working Directory” is not so important, because there’s no real need for Terraform to be aware of where the VCS repository root directory is. However, as I was describing in my original comment you can in principle still use it to encourage Terraform CLI to upload an ancestor directory when starting a remote operation, if the configuration you are planning or applying expects to find sibling directories that otherwise wouldn’t be available in the remote execution context.

If you don’t want to upload any of the parent directories though – that is, if you want Terraform to only upload the directory where you ran terraform apply and its descendents, you should leave the Working Directory unset. In that case, Terraform CLI will treat the directory where you ran terraform apply as the root directory to be uploaded, and look for a .terraformignore file in that directory.

Repository-per-module vs. One big repository

You’re correct that, at least for the purpose of modules shared in a module registry, the documentation recommends more or less one module per repository, which in practice (since we’re talking about module registries) means one module per module package in the registry’s terms, since the registry automatically creates a new versioned package per tag in the associated repository.

This decision is mostly entirely separate from everything we’ve discussed so far, though. The key difference between these two situations is how you would refer to the modules in the source argument in a module block:

  • Separate repository/package per module: source contains a module registry address, like source = "app.terraform.io/organization/name/target-system". terraform init will access the registry to find the location of the designated package, and download that package into a local cache directory so that other Terraform commands can use it.
  • One big repository: source contains a relative path starting with either ./ or ../ that directly refers to the filesystem directory containing the intended module. terraform init doesn’t need to download anything extra in this case, and instead just remembers in Terraform’s hidden module manifest that module.foo refers to ../modules/foo, or whatever.

If you choose the separate repository/package per module solution then you must publish each module separately to the private registry and then use the registry-style source addresses like I showed above. That then gives Terraform the information it needs to obtain the module source code automatically during terraform init, which avoids the need to include all of the modules in the big bundle that terraform apply would upload in order to start a remote operation.

The one way in which this decision about how to split your modules across repositories/packages is connected to the decision about “Working Directory” in Terraform Cloud is that if you choose the “one big repository” option, and therefore specify your module source addresses as relative paths, then terraform apply must upload your full set of modules as part of the initial package to start the remote operation, or else Terraform Cloud would have no way to access the source code of the other modules. Setting the “Working Directory” in the remote workspace would then let Terraform CLI understand your directory layout enough to upload the correct root directory.

A recommendation, then?

I’ve done my best above to describe three different decisions you’ll need to make about how you want to use Terraform Cloud. There’s still a lot about what you are working on that I can’t possibly know, and so I hesitate to make a specific recommendation about what you should do lest I fail to consider something that is obvious to you but invisible to me.

With that said, I do have a sense of what would be the smallest change compared to what you seem to currently have:

  • Leave the VCS workflow disabled, and rely on the CLI workflow only for now.
  • Leave “Working Directory” on all of your workspaces unset, so Terraform will treat each of your root modules as self-contained rather than trying to upload some common root directory each time.
  • For your non-root modules that you intend to share across many workspaces, publish them in Terraform Cloud’s private module registry so that they’ll be accessible both to your local runs and to remote runs in Terraform Cloud. When you declare calls to your shared modules using module blocks, use the registry-style source addresses instead of local-filesystem relative paths, so Terraform CLI can see that it needs to ask the registry to get the source code for each module.

If you’re using remote operations in Terraform Cloud with Terraform CLI then you must already have an authentication token configured for the host app.terraform.io. Those same credentials should grant access to the private module registry also on app.terraform.io, so you wouldn’t need to add any extra credentials in order to fetch those module packages from the registry when you are working locally.

I hope that helps! I can’t really give any further personalized advice here because ultimately you’re the only one who knows the full story about your system, but if you need some help with some specific details related to your account in particular then you could contact HashiCorp support who, unlike me, can potentially with your permission look directly at how you have your workspaces configured and thus make recommendations that would be more personalized to your situation.