Attempting to debug the terraform_provider_azurerm but failing to run the plan

Introduction

Hi guys, I am trying to get started with contributing to the terraform-provider-azurerm. I have noticed a problem with the azurerm_firewall_network_rule_collection I have reported it here. I thought I should take this as an opportunity to see if I can chip in and try to fix the problem I have reported! If I can debug the provider and see how the code works, then not only will I learn go, I will also be able to figure out and help with bug fixes/feature enhancements.
Any Help you guys can provide me and guide me towards the right direction will be much appreciated. I know your time is precious, and I did my best before coming to ask for help.


Summary of what I have tried to do the folllowing:

  • Fork the terraform-provider-azurerm repository.

  • Clone the forked repo to my local. I am using Ubuntu 20.04 in WSL.

  • Running vscode version 1.54.3

  • Install golang and related go tools in vscode. I am on go version go1.16.2 linux/amd64. Used linuxbrew to install go.

  • Install gcc5. Used linuxbrew to install.

  • Install delve debugger. Used linuxbrew to install.

  • Create a very simple terraform main.tf which just deploys a resource group to my Azure subscription.

  • Create a .terraformrc file ( set it in the location /home/rahul/.terraformrc ) to supply dev_overrides path to the provider.

  • Modify the main.go ( location /usr/local/go/src/github.com/terraform-providers/terraform-provider-azurerm/main.go) to use debug statements.

  • Compile the provider with the debug code and create new binaries.

  • Attach dlv in headless mode to this binary

  • Use the data from the previous step to set it in the variable TF_REATTACH_PROVIDERS

  • Run Terraform plan - hoping that have a successful plan runs.

  • Turn off debug mode and compile the provider and then run terraform plan.

  • Plan runs successfully using the local provider.

The Problem

  • The plan crashes with

    
    Terraform crashed! This is always indicative of a bug within Terraform.
    
    A crash log has been placed at "crash.log" relative to your current
    
    working directory. It would be immensely helpful if you could please
    
    report the crash with Terraform[1] so that we can fix this.
    
    
  • Content from log output


  2021/04/05 06:09:50 [INFO] Terraform version: 0.14.9  

  2021/04/05 06:09:50 [INFO] Go runtime version: go1.15.6

  2021/04/05 06:09:50 [INFO] CLI args: []string{"/home/rahul/.tfenv/versions/0.14.9/terraform", "plan"}

  2021/04/05 06:09:50 [DEBUG] Attempting to open CLI config file: /home/rahul/.terraformrc

  2021/04/05 06:09:50 Loading CLI configuration from /home/rahul/.terraformrc

  2021/04/05 06:09:50 [DEBUG] Explicit provider installation configuration is set

  2021/04/05 06:09:50 [TRACE] Selected provider installation method cliconfig.ProviderInstallationDirect with includes [] and     excludes []

  2021/04/05 06:09:50 [INFO] CLI command args: []string{"plan"}

  2021/04/05 06:09:50 [TRACE] Meta.Backend: built configuration for "local" backend with hash value 666019178

  2021/04/05 06:09:50 [TRACE] Preserving existing state lineage "8773571c-7605-2553-b418-08cbb37374a4"

  2021/04/05 06:09:50 [TRACE] Preserving existing state lineage "8773571c-7605-2553-b418-08cbb37374a4"

  2021/04/05 06:09:50 [TRACE] Meta.Backend: working directory was previously initialized for "local" backend

  2021/04/05 06:09:50 [TRACE] Meta.Backend: using already-initialized, unchanged "local" backend configuration

  2021/04/05 06:09:50 [TRACE] Meta.Backend: instantiated backend of type *local.Local

  2021/04/05 06:09:50 [TRACE] providercache.fillMetaCache: scanning directory .terraform/providers

  2021/04/05 06:09:50 [TRACE] getproviders.SearchLocalDirectory: .terraform/providers is a symlink to .terraform/providers

  2021/04/05 06:09:50 [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/azurerm v2.53.0 for    linux_amd64 at .terraform/providers/registry.terraform.io/hashicorp/azurerm/2.53.0/linux_amd64

  2021/04/05 06:09:50 [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/    azurerm/2.53.0/linux_amd64 as a candidate package for registry.terraform.io/hashicorp/azurerm 2.53.0

  2021/04/05 06:09:50 [DEBUG] Provider registry.terraform.io/hashicorp/azurerm is overridden to load from /usr/local/go/bin

  2021/04/05 06:09:50 [DEBUG] checking for provisioner in "."

  2021/04/05 06:09:50 [DEBUG] checking for provisioner in "/home/rahul/.tfenv/versions/0.14.9"

  2021/04/05 06:09:50 [INFO] Failed to read plugin lock file .terraform/plugins/linux_amd64/lock.json: open .terraform/   plugins/linux_amd64/lock.json: no such file or directory

  2021/04/05 06:09:50 [TRACE] Meta.Backend: backend *local.Local supports operations

  2021/04/05 06:09:50 [INFO] backend/local: starting Plan operation

  2021/04/05 06:09:50 [TRACE] backend/local: requesting state manager for workspace "default"

  2021/04/05 06:09:50 [TRACE] backend/local: state manager for workspace "default" will:

  - read initial snapshot from terraform.tfstate

  - write new snapshots to terraform.tfstate

  - create any backup at terraform.tfstate.backup

  2021/04/05 06:09:50 [TRACE] backend/local: requesting state lock for workspace "default"

  2021/04/05 06:09:50 [TRACE] statemgr.Filesystem: preparing to manage state snapshots at terraform.tfstate

  2021/04/05 06:09:50 [TRACE] statemgr.Filesystem: no previously-stored snapshot exists

  2021/04/05 06:09:50 [TRACE] statemgr.Filesystem: locking terraform.tfstate using fcntl flock

  2021/04/05 06:09:50 [TRACE] statemgr.Filesystem: writing lock metadata to .terraform.tfstate.lock.info

  2021/04/05 06:09:50 [TRACE] backend/local: reading remote state for workspace "default"

  2021/04/05 06:09:50 [TRACE] statemgr.Filesystem: reading latest snapshot from terraform.tfstate

  2021/04/05 06:09:50 [TRACE] statemgr.Filesystem: snapshot file has nil snapshot, but that's okay

  2021/04/05 06:09:50 [TRACE] statemgr.Filesystem: read nil snapshot

  2021/04/05 06:09:50 [TRACE] backend/local: retrieving local state snapshot for workspace "default"

  2021/04/05 06:09:50 [TRACE] backend/local: building context for current working directory

  2021/04/05 06:09:50 [TRACE] terraform.NewContext: starting

  2021/04/05 06:09:50 [TRACE] terraform.NewContext: loading provider schemas

  2021/04/05 06:09:50 [TRACE] LoadSchemas: retrieving schema for provider type "registry.terraform.io/hashicorp/azurerm"

  panic: runtime error: invalid memory address or nil pointer dereference

  [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x20f82d2]

  goroutine 52 [running]:

  github.com/hashicorp/terraform/plugin.(*GRPCProviderPlugin).Client(0xc00054ef60, 0xc00054e560, 0xc0003321e0,    0x7f73af739388, 0xc00094a180, 0x0, 0xc000817a6c)

      <autogenerated>:1 +0x32

  github.com/hashicorp/go-plugin.(*RPCClient).Dispense(0xc00073ce40, 0x289bfc5, 0x8, 0x0, 0x0, 0x29cd9d8, 0xc0003a6b30)

      /go/pkg/mod/github.com/hashicorp/go-plugin@v1.3.0/rpc_client.go:159 +0x254

  github.com/hashicorp/terraform/command.unmanagedProviderFactory.func1(0x24b0360, 0xc000575950, 0xc0003a6c08, 0xc00000b270)

      /home/circleci/project/project/command/meta_providers.go:448 +0x248

  github.com/hashicorp/terraform/terraform.(*basicComponentFactory).ResourceProvider(0xc00017d4b0, 0xc000578d8a, 0x7,     0xc000578d80, 0x9, 0x28bc2ba, 0x15, 0x2892906, 0x1, 0xc000578d8a, ...)

      /home/circleci/project/project/terraform/context_components.go:55 +0x351

  github.com/hashicorp/terraform/terraform.loadProviderSchemas.func1(0xc000578d8a, 0x7, 0xc000578d80, 0x9, 0x28bc2ba, 0x15)

      /home/circleci/project/project/terraform/schemas.go:103 +0x36e

  github.com/hashicorp/terraform/terraform.loadProviderSchemas(0xc000731ec0, 0xc0002c11e0, 0x0, 0x2ce93a0, 0xc00017d4b0,  0xc0000f2080, 0x3dcafe0, 0xc0000f2050)

      /home/circleci/project/project/terraform/schemas.go:174 +0x250

  github.com/hashicorp/terraform/terraform.LoadSchemas(0xc0002c11e0, 0x0, 0x2ce93a0, 0xc00017d4b0, 0x0, 0x2, 0x7f73afb6b400)

      /home/circleci/project/project/terraform/schemas.go:84 +0xc6

  github.com/hashicorp/terraform/terraform.NewContext(0xc0003a79d0, 0x0, 0x0, 0xc00098b900, 0x1)

      /home/circleci/project/project/terraform/context.go:205 +0x2ea

  github.com/hashicorp/terraform/backend/local.(*Local).contextDirect(0xc00015ef70, 0xc0008e2700, 0xc0002c11e0, 0x0, 0x0,     0x0, 0x0, 0x0, 0xc000731d10, 0xc000421120, ...)

      /home/circleci/project/project/backend/local/backend_local.go:178 +0x2ff

  github.com/hashicorp/terraform/backend/local.(*Local).context(0xc00015ef70, 0xc0008e2700, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)

      /home/circleci/project/project/backend/local/backend_local.go:111 +0xf18

  github.com/hashicorp/terraform/backend/local.(*Local).opPlan(0xc00015ef70, 0x2ce83a0, 0xc00073cb40, 0x2ce83a0,  0xc00073cb80, 0xc0008e2700, 0xc00073cb00)

      /home/circleci/project/project/backend/local/backend_plan.go:70 +0x2cb

  github.com/hashicorp/terraform/backend/local.(*Local).Operation.func1(0xc00017d130, 0xc00017d140, 0xc00017d150,     0xc00015ef70, 0xc00017d120, 0x2ce83a0, 0xc00073cb40, 0x2ce83a0, 0xc00073cb80, 0xc0008e2700, ...)

      /home/circleci/project/project/backend/local/backend.go:340 +0xe8

  created by github.com/hashicorp/terraform/backend/local.(*Local).Operation

      /home/circleci/project/project/backend/local/backend.go:334 +0x36d

Detailed steps including source code.

I have followed the steps documented in Terraform Docs - Debugger based debugging

  • Go Installation path => /usr/local/go

  • terraform-provider-azurerm cloned to => /usr/local/go/src/github.com/terraform-providers

  • terraform-provider-azurerm source path => /usr/local/go/src/github.com/terraform-providers/terraform-provider-azurerm

  • Terraform main.tf source path => ~/Code/terraform

  • My terraform code

    
      locals {
    
      location      = "australiaeast"
    
      resourcegroup = "rg-azfw"
    
    }
    
    terraform {
    
      backend "local" {}
    
    }
    
    terraform {
    
      required_providers {
    
        azurerm = {
    
          source  = "hashicorp/azurerm"
    
          version = "2.53.0"
    
        }
    
      }
    
    }
    
    provider "azurerm" {
    
      tenant_id                  = "xxxxxx-xxxxx-xxxxx-xxxxx-xxxx"
    
      skip_provider_registration = true
    
      features {}
    
    }
    
    resource "azurerm_resource_group" "resourcegroup" {
    
      name     = local.resourcegroup
    
      lo
    
    
  • My .terraformrc

    
      provider_installation {
    
      # Use /home/developer/go/bin as an overridden package directory
    
      # for the hashicorp/azurerm provider. This disables the version and checksum
    
      # verifications for this provider and forces Terraform to look for the
    
      # azurerm provider plugin in the given directory.
    
      dev_overrides {
    
        "registry.terraform.io/hashicorp/azurerm" = "/usr/local/go/bin"
    
      }
    
      # For all other providers, install them directly from their origin provider
    
      # registries as normal. If you omit this, Terraform will _only_ use
    
      # the dev_overrides block, and so no other providers will be available.
    
       direct {}
    
      }
    
    
  • Modified the main.go file in terraform-provider-azurerm so that the provider binary can be launched in debugger mode. Now the main function looks like :

    
    func main() {
    
        // remove date and time stamp from log output as the plugin SDK already adds its own
    
        log.SetFlags(log.Flags() &^ (log.Ldate | log.Ltime))
    
        var debugMode bool
    
        flag.BoolVar(&debugMode, "debug", true, "set to true to run the provider with support for debuggers like delve")
    
        flag.Parse()
    
        opts := &plugin.ServeOpts{ProviderFunc: azurerm.Provider}
    
        if debugMode {
    
            // TODO: update this string with the full name of your provider as used in your configs
    
            err := plugin.Debug(context.Background(), "registry.terraform.io/hashicorp/azurerm", opts)
    
            if err != nil {
    
                log.Fatal(err.Error())
    
            }
    
            return
    
        }
    
        plugin.Serve(opts)
    
        // plugin.Serve(&plugin.ServeOpts{
    
        //  ProviderFunc: azurerm.Provider,
    
        // })
    
    }
    
    
  • The vscode debugger launch config looks like :

    
        {
    
            // Use IntelliSense to learn about possible attributes.
    
            // Hover to view descriptions of existing attributes.
    
            // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    
            "version": "0.2.0",
    
            "configurations": [
    
                {
    
                    "name": "Connect to server",
    
                    "type": "go",
    
                    "request": "attach",
    
                    "mode": "remote",
    
                    "remotePath": "${workspaceFolder}",
    
                    "port": 36283,
    
                    "host": "127.0.0.1",
    
                    "apiVersion": 1
    
                }
    
            ]
    
        }
    
    
  • Ran make build to build the binaries. It copies the binary to $GOPATH/bin/ location.

  • Launching the delv headless debugger with the command dlv exec --listen=127.0.0.1:36283 --api-version=1 --headless $GOPATH/bin/terraform-provider-azurerm -- --debug

    • The --listen=127.0.0.1:36283 is an additional parameter I googled and figured out - not mentioned in terraform docs. This helps me in not changing the ports for debugging everytime I am launching the debugger.
  • That produces an output like :

    
     API server listening at: 127.0.0.1:36283
    
     {"@level":"debug","@message":"plugin address","@timestamp":"2021-04-05T06:08:38.863169+10:00","address":"/tmp/  plugin594405865","network":"unix"}
    
     Provider server started; to attach Terraform, set TF_REATTACH_PROVIDERS to the following:
    
     {"registry.terraform.io/hashicorp/azurerm":{"Protocol":"netrpc","Pid":16362,"Test":true,"Addr":{"Network":"unix","String":"/    tmp/plugin594405865"}}}
    
    
  • Set the TF_REATTACH_PROVIDERS value

    
        export TF_REATTACH_PROVIDERS='{"registry.terraform.io/hashicorp/azurerm":{"Protocol":"netrpc","Pid":16362,"Test":true,"Addr":{"Network":"unix","String":"/tmp/plugin594405865"}}}'
    
    
  • Run the terraform plan in the folder where the tf source code is there… And all goes boom.

Hi @miniGweek1,

I’m not certain where the bug is yet offhand, as this method for running providers is not often used outside of the provider’s internal tests, and hence tends to be automated.

The first thing I see is that the TF_REATTACH_PROVIDERS value has "Protocol":"netrpc", which doesn’t look correct as terraform no longer uses the netrpc protocol for plugins. The documentation does show "grpc" as the protocol in the connection string, but because the provider is outputting netrpc, I suspect there is either a bug in the terraform-plugin-sdk itself, or the old version of the SDK used by the azurerm provider does not fully support this functionality.

1 Like

I’m probably not remembering all of the details correctly here, but in this special mode the plugin process is not running as a child process of terraform as we typically expect, and so it can’t see the environment variables that Terraform would normally set to tell the child process that it supports protocol version 5 and grpc, and so I assume it’s running through the normal fallback behavior.

For a provider that is still compatible with Terraform v0.11 the normal correct default is to choose netrpc (for protocol version 4), but this TF_REATTACH_PROVIDERS variable isn’t supported in Terraform v0.11 and so it’s not really sensible to choose that default here. :confused: (I expect other providers are working in spite of this because they’ve dropped Terraform v0.11 support already and so they treat protocol 5 and grpc as the baseline.)

With all of this said, I wonder if you’d get a better result if you set the following environment variable before launching that dlv exec command:

PLUGIN_PROTOCOL_VERSIONS=5

If my theory is correct that the debug-mode provider is trying to use the same environment variable handshake parameters that would normally be used in non-debug mode then hopefully that’ll cause the provider to select protocol version 5, and thus start a grpc server that current versions of Terraform know how to connect to. (If this works then you should see that TF_REATTACH_PROVIDERS value say "Protocol":"grpc" instead of "Protocol":"netrpc".)

1 Like

THIS !!!
Worked!

I got to see the expected grpc
This was the output after running dlv exec

{"registry.terraform.io/hashicorp/azurerm":{"Protocol":"grpc","Pid":12258,"Test":true,"Addr":{"Network":"unix","String":"/tmp/plugin477314353"}}}

Thank you so much! Apologies for replying back late. I didn’t really expect anyone to reply. Imagine my surprise today morning when I check back to see 2 good thought out replies :slight_smile:

I can attach the breakpoint to a function and see debugger hit the breakpoint in vscode. Whew!

You sir, are a legend. Thanks!!!

You are spot on :slight_smile: Thanks for taking out the time to go through it.
I will now debug it thanks to reply from Profile - apparentlymart - HashiCorp Discuss