Framework migration test produces non-empty plan

alexhung · June 2, 2023, 7:54pm

We discovered that our migrated resource (from SDKv2 to framework) is generating changes when practitioners run ‘terraform plan’ after upgrading the provider.

We realized we did not add the migration test as described here: Testing Migration: Migrating from SDKv2 to the Framework | Terraform | HashiCorp Developer

Once we added the test, we worked through all the attributes that were incorrectly coded.

However I’m now left with this final error where the test fails with just a “update in-place” error for the entire resource:

resource_artifactory_group_test.go:27: Step 2/2 error: After applying this test step, the plan was not empty.
  stdout:

  Terraform used the selected providers to generate the following execution
  plan. Resource actions are indicated with the following symbols:
    ~ update in-place

  Terraform will perform the following actions:

    # artifactory_group.test-group-upgrade-3509098 will be updated in-place
    ~ resource "artifactory_group" "test-group-upgrade-3509098" {
          id               = "test-group-upgrade-3509098"
          name             = "test-group-upgrade-3509098"
          # (7 unchanged attributes hidden)
      }

  Plan: 0 to add, 1 to change, 0 to destroy.

I’ve verified that the provider produces empty plan. I install the old version of the provider, run ‘terraform apply’, install new provider, run ‘terraform plan’, and get “No changes.” message.

I’ve turned on debug logging and I don’t see any clues in there to tell me why the test fails.

Is there other tools/ways to find out why the update in-place for the test, but not in actual use?

alexhung · June 5, 2023, 11:39pm

Another resource that we migrated (out of 4 in total so far) also has this issue.

Any help in figuring this out will be very useful. And I loaf to continue investing time to migrate to the new framework until I can solve this issue.

austin.valle · June 6, 2023, 2:01pm

Hey @alexhung !

Specifically for the testing framework, if you’re trying to get more insight on something happening in the plan, you may be able to write a custom plan check to debug that information. Plan checks are relatively new and were released in 1.2.0 of terraform-plugin-testing.

There is a plan check example on the developer site I linked, or you could reference one of the terraform-plugin-testing built-in plan checks here.

Plan checks should have all of the information passed from the Terraform CLI about resource changes (with the raw before/after values), prior state, etc. in the CheckPlanRequest.Plan struct. (if you’re familiar with the terraform show command, it’s equivalent to that)

If you do go the plan check route, you’ll most likely want to set the plan check on the TestStep.ConfigPlanChecks.PostApplyPreRefresh step, as that runs right before the empty plan check logic you mentioned in your error.

Let us know if that gets you closer to finding the root cause!

alexhung · June 6, 2023, 10:22pm

Thanks @austin.valle for the link to the custom plan check documentation. I missed that part of the doc!

I applied the ‘PostApplyPreRefresh’ check at the step 1 of the migration test and extracted the following from ‘req.Plan.ResourceChanges’:

{
    "address": "artifactory_group.test-group-upgrade-7970241",
    "mode": "managed",
    "type": "artifactory_group",
    "name": "test-group-upgrade-7970241",
    "provider_name": "registry.terraform.io/jfrog/artifactory",
    "change": {
        "actions": [
            "no-op"
        ],
        "before": {
            "admin_privileges": null,
            "auto_join": null,
            "description": null,
            "detach_all_users": null,
            "external_id": null,
            "id": "test-group-upgrade-7970241",
            "name": "test-group-upgrade-7970241",
            "policy_manager": false,
            "realm": null,
            "realm_attributes": null,
            "reports_manager": false,
            "users_names": null,
            "watch_manager": false
        },
        "after": {
            "admin_privileges": null,
            "auto_join": null,
            "description": null,
            "detach_all_users": null,
            "external_id": null,
            "id": "test-group-upgrade-7970241",
            "name": "test-group-upgrade-7970241",
            "policy_manager": false,
            "realm": null,
            "realm_attributes": null,
            "reports_manager": false,
            "users_names": null,
            "watch_manager": false
        },
        "after_unknown": {},
        "before_sensitive": {},
        "after_sensitive": {}
    }
}

The ‘actions’ has no-op so it should mean no change but the test still fails with non-empty plan. I don’t see any difference between the before and after data though.

The req.Plan.ResourceChanges.ResourceDrift is empty array so I assume that means no drift?

alexhung · June 6, 2023, 10:28pm

I notice that when I apply the plan check for ‘PostApplyPostRefresh’ I got this similar data.

{
    "address": "artifactory_group.test-group-upgrade-5874087",
    "mode": "managed",
    "type": "artifactory_group",
    "name": "test-group-upgrade-5874087",
    "provider_name": "registry.terraform.io/jfrog/artifactory",
    "change": {
        "actions": [
            "no-op"
        ],
        "before": {
            "admin_privileges": false,
            "auto_join": false,
            "description": "",
            "detach_all_users": null,
            "external_id": "",
            "id": "test-group-upgrade-5874087",
            "name": "test-group-upgrade-5874087",
            "policy_manager": false,
            "realm": "internal",
            "realm_attributes": "",
            "reports_manager": false,
            "users_names": [],
            "watch_manager": false
        },
        "after": {
            "admin_privileges": false,
            "auto_join": false,
            "description": "",
            "detach_all_users": null,
            "external_id": "",
            "id": "test-group-upgrade-5874087",
            "name": "test-group-upgrade-5874087",
            "policy_manager": false,
            "realm": "internal",
            "realm_attributes": "",
            "reports_manager": false,
            "users_names": [],
            "watch_manager": false
        },
        "after_unknown": {},
        "before_sensitive": {
            "users_names": []
        },
        "after_sensitive": {
            "users_names": []
        }
    }
}

But notice the users_names attribute changes from null to [].

Would this be the reason the test fails? If so, why does Terraform thinks the attribute is unchanged but marks the entire resource as needing ‘update in-place’?

alexhung · June 6, 2023, 11:19pm

@austin.valle Is it possible that this is due to the fact the resource created by SDKv2 provider allows this attribute (and the other troublesome ones) to be null in the state? And that framework schema doesn’t allow null state and assign an equivalent (e.g. [] for SetAttribute and "" for StringAttribute), thus causing state drift.

Then I applied ‘hacks’ to avoid it by assigning null values back (e.g. types.SetNull() or types.StringValue("")) and thus confuse Terraform further?

If so, what’s the suggestion to maintain backward compatibility? Use a custom type and apply some plan modification ‘magic’?

austin.valle · June 8, 2023, 2:28pm

It looks like you’re running into two known issues:

As you noted, migrating from SDKv2 to Framework has changed the handling of null values and Terraform is detecting a diff correctly (I believe you have identified the affected attribute here with user_names). That still needs to be documented in our migration guide and there is an open issue for it here: Document Mitigation for Empty String to Null Plans for SDK Migrations · Issue #510 · hashicorp/terraform-plugin-framework · GitHub
While Terraform is correctly detecting a diff, it is not correctly displaying the output to the CLI, as you’d expect a diff to be displayed with user_names, hence the confusing error you ran into with your acceptance tests: CLI Plan Output Not Showing null to "" Differences for Non-Legacy Type System Providers · Issue #31887 · hashicorp/terraform · GitHub

As for potential solutions, a plan modifier on the affected attribute looks like the best option here, however the logic needed depends on your situation:

If the prior state and config is null, plan null
If the prior state is zero-value and config is null, plan zero-value
If the prior state is zero-value and config is zero-value, plan zero-value

There also a consideration to be made with how the API responds with the affected attribute (if it returns null back, which you would have to handle in the Read function, or Terraform will detect drift during a refresh)

Another solution would be to use a custom type (which during writing this, I have seen your comment in the other discuss post lol). The recently released semantic equality functionality for custom types would allow you to fully prevent SDK → Framework differences without implementing custom code in your Read method.

Hopefully this information is useful, and let me know if I missed any details! (also leave a on those issues )

alexhung · June 8, 2023, 3:54pm

@austin.valle Thanks for the explanation! I’ll give the plan modifier a try. We do have some API consideration so I’ll have to keep the logic in the Read function. I’ll let you know how this works out.

alexhung · June 8, 2023, 4:57pm

@austin.valle Quick update.

The plan modifier approach works as far as replacing this ‘hack’ of setting null default value I have on the attribute.

Default: setdefault.StaticValue(types.SetNull(types.StringType))

However, both issues remains. Meaning the debug output from ConfigPlanCheck shows the users_names was null in PostApplyPreRefresh and [] in PostApplyPostRefresh. Thus still triggering the second GitHub issue where CLI still shows ‘update in-place’ but no attribute marked.

I’ve verified that the r.UsersNames value is set to null in the Read func so in theory the refresh should contain null, not . But it isn’t.

I’ll need to try the custom type with semantic equality to see if that really solve this issue, if at all.

alexhung · June 8, 2023, 5:20pm

@austin.valle Given this statement in the Semantic Equality documentation:

The framework will only call semantic equality logic if both the prior and new values are known. Null or unknown values are unnecessary to check.

I don’t think this will work in our case where the prior state (created by previous provider) is null.

alexhung · June 8, 2023, 10:24pm

@austin.valle @bflad I have a feeling that we don’t have workarounds until those 2 known issues are resolved in the framework. Or if there is, someone smarter than me may be able to suggest a solution.

Per previous message, plan modifier doesn’t resolve the second known issue.

I implemented custom type for the users_names attribute which is a SetAttribute. Since SetType is a bit tricky I tried to follow the doc carefully and translate each step from string to set.

The same migration test fails with the following error which I have to say, very frustrating at this point of the migration:

resource_artifactory_group_test.go:28: Step 2/2 error: Error running pre-apply refresh: exit status 1

    Error: Value Conversion Error

      with artifactory_group.test-group-upgrade-8659464,
      on terraform_plugin_test.tf line 2, in resource "artifactory_group" "test-group-upgrade-8659464":
        2: 		resource "artifactory_group" "test-group-upgrade-8659464" {

    An unexpected error was encountered while verifying an attribute value
    matched its expected type to prevent unexpected behavior or panics. This is
    always an error in the provider. Please report the following to the provider
    developer:

    Expected type: CustomSetType
    Value type: CustomSetType
    Path: users_names

Expected type is same as Value type but that’s invalid?

Then I moved onto the other problematic resource which has similar issues with 2 string attributes, both of them read-only/computed. I was hoping that since these are string type I couldn’t mess up the custom type implement. Just copy and paste the example from the doc and done, right?

Well, I got similar but worse(?) error from the test:

resource_artifactory_scoped_token_test.go:38: Step 2/2 error: Error running pre-apply refresh: exit status 1

    Error: Value Conversion Error

      with artifactory_scoped_token.test-access-token2208902,
      on terraform_plugin_test.tf line 10, in resource "artifactory_scoped_token" "test-access-token2208902":
      10: 		resource "artifactory_scoped_token" "test-access-token2208902" {

    An unexpected error was encountered while verifying an attribute value
    matched its expected type to prevent unexpected behavior or panics. This is
    always an error in the provider. Please report the following to the provider
    developer:

    Expected type: CustomStringType
    Value type: basetypes.StringType
    Path: refresh_token

Turns out the plan modifier (stringplanmodifier.UseStateForUnknown()) is the cause. Does this mean I need to create a plan modifier for the custom type?

Without the plan modifier, of course the plan has state drift:

+ refresh_token           = (known after apply)

At this point, after spending a large amount of time troubleshooting these issues, I am ready to roll back all the migration and wait until the framework can work with state from SDKv2. IMO, making the framework GA without extensively testing against handling states from SDKv2 is unfortunate. Granted SDKv2 has all kind of weird behaviors which framework tries to correct but without a reliable migration path forward, I can’t recommend any provider developers to implement framework for existing resources/data sources.

As for rolling back, I don’t even know how to proceed. Revert all the migrated code and bump major version again?

austin.valle · June 13, 2023, 7:46pm

Some of this was discussed out of band, so posting some findings for future travelers that may have similar issues.

Problem

It’s common to run into a problem while migrating from SDKv2 to Plugin Framework as Terraform core handles SDKv2 data differently in certain situations compared to Plugin Framework. This problem is briefly summarized here: Document Mitigation for Empty String to Null Plans for SDK Migrations · Issue #510 · hashicorp/terraform-plugin-framework · GitHub. Running into this problem can prompt Terraform to detect changes to a resource due to the proper data handling of null by Plugin Framework.

Amplifying the confusion when experiencing this migration issue, is a bug with Terraform CLI that hides the attributes that are causing a resource to be updated, that looks like below:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # random_password.test will be updated in-place
  ~ resource "random_password" "test" {
        id          = "none"
        # (12 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

In the example above, there is actually a change being made (due to the migration now handling null values properly, however Terraform is not correctly displaying the output to the human-readable plan output.

Debugging this problem

In the acceptance tests for your provider, you can use a Plan Check to output the machine-readable plan output, which will have the correct information about the attributes that are planned for change (before and after).

A custom debug plan check can be added to your tests that will output the machine readable plan output, which can be useful in determining what values are changing during migration. The documentation of this machine-readable plan output is here.

Implementing a simple debug plan check and using it

package plancheck_test

import (
	"context"
	"encoding/json"
	"fmt"
	"testing"

	r "github.com/hashicorp/terraform-plugin-testing/helper/resource"
	"github.com/hashicorp/terraform-plugin-testing/plancheck"
)

var _ plancheck.PlanCheck = debugPlan{}

type debugPlan struct{}

func (e debugPlan) CheckPlan(ctx context.Context, req plancheck.CheckPlanRequest, resp *plancheck.CheckPlanResponse) {
	rd, err := json.Marshal(req.Plan)
	if err != nil {
		fmt.Println("error marshalling machine-readable plan output:", err)
	}
	fmt.Printf("req.Plan - %s\n", string(rd))
}

func DebugPlan() plancheck.PlanCheck {
	return debugPlan{}
}

func Test_DebugPlan(t *testing.T) {
	t.Parallel()

	r.Test(t, r.TestCase{
		ExternalProviders: map[string]r.ExternalProvider{
			"random": {
				Source: "registry.terraform.io/hashicorp/random",
			},
		},
		Steps: []r.TestStep{
			{
				Config: `resource "random_string" "one" {
					length = 16
				}`,
				PlanOnly: true,
				ConfigPlanChecks: r.ConfigPlanChecks{
					PostApplyPreRefresh: []plancheck.PlanCheck{
						DebugPlan(),
					},
				},
			},
		},
	})
}

Output

req.Plan - {"format_version":"1.1","terraform_version":"1.4.1","planned_values":{"root_module":{"resources":[{"address":"random_string.one","mode":"managed","type":"random_string","name":"one","provider_name":"registry.terraform.io/hashicorp/random","schema_version":2,"values":{"keepers":null,"length":16,"lower":true,"min_lower":0,"min_numeric":0,"min_special":0,"min_upper":0,"number":true,"numeric":true,"override_special":null,"special":true,"upper":true},"sensitive_values":{}}]}},"resource_changes":[{"address":"random_string.one","mode":"managed","type":"random_string","name":"one","provider_name":"registry.terraform.io/hashicorp/random","change":{"actions":["create"],"before":null,"after":{"keepers":null,"length":16,"lower":true,"min_lower":0,"min_numeric":0,"min_special":0,"min_upper":0,"number":true,"numeric":true,"override_special":null,"special":true,"upper":true},"after_unknown":{"id":true,"result":true},"before_sensitive":false,"after_sensitive":{}}}],"configuration":{"provider_config":{"random":{"name":"random","full_name":"registry.terraform.io/hashicorp/random"}},"root_module":{"resources":[{"address":"random_string.one","mode":"managed","type":"random_string","name":"one","provider_config_key":"random","expressions":{"length":{"constant_value":16}},"schema_version":2}]}}}

In this specific post, we found a string attribute that was being changed from "" in SDKv2, to null with Plugin Framework. This information was hidden from the Terraform CLI due to the bug, but was present in the machine readable output.

Topic		Replies	Views
After applying this test step and performing a `terraform refresh`, the plan was not empty Terraform Providers	1	1808	August 4, 2022
Terraform framework migration tests with non empty plan due to optional attributes Plugin Development	3	268	August 17, 2023
Testing for State upgrades in Framework Terraform Providers	0	98	December 13, 2023
Acceptance Testing using a local External Provider Plugin Development	2	298	January 30, 2024
Terraform Framework migration testing Plugin Development	3	234	July 6, 2023

Framework migration test produces non-empty plan

Problem

Debugging this problem

Implementing a simple debug plan check and using it

Output

Related topics