Terraform Enterprise plans are killed with extremely verbose trace/debug logs

Recently a workspace has been erroring out on the plan stage. We do not have TF_LOG set to anything, but what we see is a long pause during planning followed by massive amount of TRACE/DEBUG logs and at the end “Killed”. Looking at the logs I don’t see anything that tells me what’s wrong, but there are a lot so I may be missing something. Is there some sort of cap on how many resources a workspace can manage and ours just hit some limit and was killed? Wild guess.

Here are some snippets:

Terraform v0.14.10
Configuring remote state backend...
Initializing Terraform configuration...
2022/06/23 20:22:15 [INFO] Terraform version: 0.14.10  
2022/06/23 20:22:15 [INFO] Go runtime version: go1.15.6
2022/06/23 20:22:15 [INFO] CLI args: []string{"terraform", "plan", "-out", "/terraform/terraform.tfplan", "-detailed-exitcode", "-var-file=/terraform/user-config/prod/terraform.tfvars", "-lock=false", "-parallelism=10"}
2022/06/23 20:22:15 [DEBUG] Attempting to open CLI config file: /tmp/cli.tfrc
2022/06/23 20:22:15 Loading CLI configuration from /tmp/cli.tfrc
2022/06/23 20:22:15 [DEBUG] Not reading CLI config directory because config location is overridden by environment variable
2022/06/23 20:22:15 [DEBUG] Using modified User-Agent: HashiCorp Terraform/0.14.10 (+https://www.terraform.io) TFE/v202206-1
2022/06/23 20:22:15 [DEBUG] ignoring non-existing provider search directory terraform.d/plugins
2022/06/23 20:22:15 [DEBUG] ignoring non-existing provider search directory /root/.terraform.d/plugins
2022/06/23 20:22:15 [DEBUG] ignoring non-existing provider search directory /root/.local/share/terraform/plugins
2022/06/23 20:22:15 [DEBUG] will search for provider plugins in /usr/local/share/terraform/plugins
2022/06/23 20:22:15 [TRACE] getproviders.SearchLocalDirectory: /usr/local/share/terraform/plugins is a symlink to /usr/local/share/terraform/plugins
2022/06/23 20:22:15 [DEBUG] ignoring non-existing provider search directory /usr/share/terraform/plugins
2022/06/23 20:22:15 [INFO] CLI command args: []string{"plan", "-out", "/terraform/terraform.tfplan", "-detailed-exitcode", "-var-file=/terraform/user-config/prod/terraform.tfvars", "-lock=false", "-parallelism=10"}

Followed by a lot of

2022/06/23 20:23:19 [TRACE] dag/walk: vertex "module.horizon.module.test_security_groups.module.security_group.module.namer (close)" is waiting for "module.horizon.module.test_security_groups.module.security_group.module.namer.output.name (expand)"
2022/06/23 20:23:19 [TRACE] dag/walk: vertex "module.data_scaling.module.prod_security_groups.module.tagger.local.name_suffix (expand)" is waiting for "module.data_scaling.module.prod_security_groups.module.tagger.var.name_suffix (expand)"
2022/06/23 20:23:19 [TRACE] dag/walk: vertex "module.academics.module.test_security_groups_policies.var.system (expand)" is waiting for "module.academics.module.test_security_groups_policies (expand)"
2022/06/23 20:23:19 [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hashicorp/helm\"].wario_test (close)" is waiting for "module.data_scaling.module.test_security_groups_policies.helm_release.security_group_policy"
2022/06/23 20:23:19 [TRACE] dag/walk: vertex "module.academics-platform.module.dev_security_groups_policies.var.system (expand)" is waiting for "module.academics-platform.module.dev_security_groups_policies (expand)"
2022/06/23 20:23:19 [TRACE] dag/walk: vertex "module.academics.module.test_service_accounts (expand)" is waiting for "module.academics.var.service_accounts (expand)"
2022/06/23 20:23:19 [TRACE] dag/walk: vertex "module.helios.module.dev_security_groups.var.team (expand)" is waiting for "module.helios.module.dev_security_groups (expand)"
2022/06/23 20:23:19 [TRACE] dag/walk: vertex "module.pap.module.staging_security_groups.var.contact (expand)" is waiting for "module.pap.var.contact (expand)"
2022/06/23 20:23:19 [TRACE] dag/walk: vertex "module.academics.module.staging_security_groups_policies.module.tagger.var.compliance-hipaa (expand)" is waiting for "module.academics.module.staging_security_groups_policies.module.tagger (expand)"
2022/06/23 20:23:19 [TRACE] dag/walk: vertex "module.dataconnect.module.prod_service_accounts.var.kubernetes_annotations (expand)" is waiting for "module.dataconnect.module.prod_service_accounts (expand)"
2022/06/23 20:23:20 [TRACE] dag/walk: vertex "module.academics.provider[\"registry.terraform.io/hashicorp/helm\"].test_cluster (close)" is waiting for "module.academics.module.test_security_groups_policies.helm_release.security_group_policy"
2022/06/23 20:23:20 [TRACE] dag/walk: vertex "module.horizon.module.dev_security_groups.module.security_group.local.ingress (expand)" is waiting for "module.horizon.module.dev_security_groups.module.security_group.var.enable_consul_ingress (expand)"
2022/06/23 20:23:20 [TRACE] dag/walk: vertex "module.pap.module.prod_service_accounts.local.role_arn (expand)" is waiting for "module.pap.module.prod_service_accounts.var.iam_role (expand)"
2022/06/23 20:23:20 [TRACE] dag/walk: vertex "module.clinicalgenomics.module.staging_security_groups.var.security_group_description (expand)" is waiting for "module.clinicalgenomics.module.staging_security_groups (expand)"
2022/06/23 20:23:24 [TRACE] dag/walk: vertex "module.dataconnect.module.prod_security_groups.module.security_group.module.namer.local.name_format_default (expand)" is waiting for "module.dataconnect.module.prod_security_groups.module.security_group.module.namer.var.use_num_suffix (expand)"
2022/06/23 20:23:24 [TRACE] dag/walk: vertex "module.academics-platform.module.test_security_groups.module.security_group.output.ingress (expand)" is waiting for "module.academics-platform.module.test_security_groups.module.security_group.local.this_ingress (expand)"
2022/06/23 20:23:24 [TRACE] dag/walk: vertex "module.hc_signal.module.staging_security_groups_policies.module.tagger.local.env_test (expand)" is waiting for "module.hc_signal.module.staging_security_groups_policies.module.tagger.local.envs (expand)"
2022/06/23 20:23:24 [TRACE] dag/walk: vertex "module.hc_signal.module.staging_namespaces.local.role_binding_name (expand)" is waiting for "module.hc_signal.module.staging_namespaces (expand)"
2022/06/23 20:23:24 [TRACE] dag/walk: vertex "module.practice-management.module.staging_security_groups.module.tagger (close)" is waiting for "module.practice-management.module.staging_security_groups.module.tagger.local.env_test (expand)"
2022/06/23 20:23:24 [TRACE] dag/walk: vertex "module.pap.module.prod_security_groups.module.security_group.local.vpn_east (expand)" is waiting for "module.pap.module.prod_security_groups.module.security_group (expand)"
2022/06/23 20:23:24 [TRACE] dag/walk: vertex "module.data_scaling.module.test_security_groups_policies.module.tagger.output.name (expand)" is waiting for "module.data_scaling.module.test_security_groups_policies.module.tagger.local.name (expand)"
2022/06/23 20:23:24 [TRACE] dag/walk: vertex "module.pap.module.prod_security_groups_policies.module.tagger.local.base_tags (expand)" is waiting for "module.pap.module.prod_security_groups_policies.module.tagger.local.contact (expand)"

Followed by a final: Killed

Hi @munali,

I think that the log output you are seeing here may be caused by the process being killed, rather than the other way around, because in some cases when Terraform CLI has to abort for an unusual reason it will emit its internal logs as a last-resort attempt to explain why

I think the main question then is why the Terraform Enterprise execution environment killed that process. Unfortunately we cannot determine that from the Terraform CLI logs because that component only knows that it was “killed”, and doesn’t know what killed it or why.

Given that, I think this might be a question better to ask HashiCorp Support, since they will hopefully be able to show you how to collect relevant diagnostic information from Terraform Enterprise in order to understand why this is happening.

https://support.hashicorp.com/hc/en-us/requests/new

Although we can sometimes help with Terraform Cloud and Terraform Enterprise issues here, the support team is better equipped for this sort of detailed debugging of particular situations.

Thanks!

Thanks @apparentlymart - I will reach out to support. I suspect the container was probably using too much memory and was killed off.