Normalization vs config drift. Again

Hello community.
I would like to start this topic again, as solutions I have found so far do not solve my problem.

I am in the middle of migrating resources of custom provider from SDKv2 to Plugin Framework.
Remote system supports explicit unset of the value by providing a magic none value for the field and returns an empty string "" during subsequent reads.
In the SDKv2 it was mitigated with StateFunc for resource field:

StateFunc: func(i interface{}) string {
	v := i.(string)
	if v == "none" {
		return ""
	}

	return v
},

However, in PluginFramework it is not there, so I tried to mutate the state before saving in Update resource (just one of use-cases, when resource created with empty string in the field and then use ā€˜noneā€™ which should produce the same state):

func (r *resource) Update(ctx context.Context, req resource.UpdateRequest, resp *resource.UpdateResponse) {
	var terraformPlan terraformModel
	
	resp.Diagnostics.Append(req.Plan.Get(ctx, &terraformPlan)...)
	planned := terraformPlan.NextPool.ValueString()
	if planned == "none" {
			terraformPlan.Field = tftypes.StringValue("")
		}
	resp.Diagnostics.Append(resp.Plan.Set(ctx, &terraformPlan)...)
}

but this approach results in error during tests:

value was "none" but now ""

I will try to implement a custom type to handle this issue, but it sounds overwhelming to do so just to silent normalization of a string.

Would appreciate any link to reference/similar implementation of this case.

Hey there @maksym-nazarenko :wave: , welcome to the discuss forum and sorry youā€™re running into issues here. Iā€™ll start with some background/context:

Background

The usage of StateFunc youā€™re describing that sets a planned value of "none" into state as "" is considered invalid, as of Terraform 0.12, and we typically refer to these types of problems as ā€œlegacy SDK data consistencyā€. Terraformā€™s resource lifecycle documentation has a full description, but the applicable rule violation I believe youā€™re running into is:

  • Resources should always set an attribute state value to the exact configuration value or prior state value, if not null.

Note: Terraform Plugin Framework cannot avoid these data consistency rules, and will always result in an error

When Terraform encounters an unexpected data handling behavior from terraform-plugin-sdk resources during planning or applying operations, instead of immediately raising an error diagnostic to practitioners (or provider developers during acceptance testing), it will generate a warning log entry. If a problematic attribute value is referenced by another resource in the same Terraform configuration, Terraform will raise an error diagnostic for the downstream resource due to the unexpected value behavior caused by the upstream resource.

Checking for this warning log

Enable Terraform logging. When running Terraform commands, such as terraform apply, the TF_LOG=TRACE environment variable can be set, such as TF_LOG=TRACE terraform apply. If there are data consistency errors, Terraform will create warning logs containing the phrase ā€œlegacy plugin SDKā€, that look something like this:

TIMESTAMP [WARN]  Provider "TYPE" produced an invalid plan for ADDRESS, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - <description of error, something related to planned value not being final applied value>

Solution

As you mentioned the solution is to ensure that your provider code always preserves the config value of "none", or any prior state value that exists, like "". This can be done with a custom type and semantic equality as you mentioned, which the custom type could then be shared between all string attributes that need this behavior.

You could implement functionality in your CRUD functions as well to always preserve the config value of "none" as that will satisfy Terraformā€™s data consistency rules.

Future docs + debugging

We currently have a PR out to allow SDKv2 developers to control if this legacy SDK issue is logged or throws an error, which may aid in discovering these data consistency issues in larger providers.

This PR also contains documentation about the overall problem of ā€œlegacy SDK data consistencyā€, with use-cases and solutions that I will look to expand with this problem once merged.

thank you @austin.valle for the exhaustive answer and no worries about

sorry youā€™re running into issues here

:wink:

I implemented the semantic equality for a custom type and it seems to be working solution (minor issues due to reflect magic in my code, but it is out of scope for this conversation).

However, this approach adds code which is used only to silent this normalization (I would like to have 5 lines in CRUD function rather than 2 extra types for semantic equality - *Type and *Value)

Iā€™m still trying to implement it in my CRUD functions, but apparently I donā€™t understand where the state comes from and at which phase Plugin Framework argues about changed value from "none" to "".

To be more specific, I have a failing test that:

  1. Creates resource with "" as initial value for the field
  2. Updates resource with "none" as new value

In this case, the remote API considers "none" to be equal to "", but Plugin Framework argues about this change.

Maybe you have a working example of such normalization without custom type?

Upd:

According to Update method:

An error is returned unless every null or known value in the request plan is saved exactly as-is into the response state. Only unknown plan values can be modified.

So, in this case:

Previous state: `""`
Planned value: `"none"`
Remote API returned: `""`

if I set field to "" and do resp.State.Set(...), then it violates data consistency rules, and I get this error:

was cty.StringVal("none"), but now cty.StringVal("").

If the actual desired and achieved remote value is "", and "none" is just a detail of the remote API used to trigger the transition to value "", then I suggest the correct way to model this in Terraform, would be to use "" throughout the code working with Terraform, and translate to "none" only in the code that actually sends the remote API request.

In this way there will be no data consistency issue, as "" will be used throughout the code interacting with Terraform. Would that work?

hey @maxb thanks for the hint!
Unfortunately, remote API service does not accept "" as valid value, thatā€™s why "none" was introduced, I suppose.

Hi @maksym-nazarenko,
I think that @maxb is suggesting that your provider implementation could send ā€œnoneā€ to the remote API service but when dealing with Terraform plan, state, and configuration, this "none" value will be represented as "".

For example, the user can update an existing attribute value to "" in their Terraform configuration to indicate an unset of the value and the Update() method in your resource could send "none" to the remote API service and set the attribute value to "" in the Terraform state. Since the remote API service will return "" for that attribute in subsequent reads, this should not violate Terraformā€™s data consistency rules.

I would recommend this route as well, if you do not want to go the route of implementing a custom type with semantic equality.

ah, that makes sense @SBGoods
indeed, I got @maxb 's suggestion wrong.
I implemented Update method so in case of transition from any value to "", Iā€™m sending "none".
The reason I couldnā€™t implement it before was that I wanted to keep "none" as valid value, but I can fully rely on empty string here "".

Despite my case can be mitigated in this way, there is still issue if one wants to implement full 1-to-1 API interface between Terraform provider and API client and use "none" as a valid value. In that case, the only way would be to implement a custom type, I believeā€¦

I think this issue is solved for me.
Thank you all!

I see that this problem is solved but just wanted to add an extra note in case itā€™s useful to future readers who might find this topic.

When you implement the ā€œsemantic equalityā€ concept in your provider, you effectively tell Terraform that two or more values that it would normally consider non-equal by its own rules should actually be treated as equal when making decisions about the resource change lifecycle, such as whether a particular change requires updating the remote object via a real API call.

An important detail of that behavior is that Terraform will preserve the original value that the user wrote in the configuration. If you tell the framework that "" and "none" are equivalent then whichever form the module author has chosen when they first plan and apply with that argument will be ā€œlocked inā€ and if they later switch to the other form Terraform will essentially ignore the change, preserving the value whichever way it was originally written. Terraform will only update to a new value if the new value is not ā€œsemantically equalā€ to the previous value, as defined by the provider.

That behavior can be a little surprising from the perspective of an individual provider developer but itā€™s important for overall configuration consistency, particularly when data flows between different providers: if something else in the configuration interpolates the value of the attribute then the first attribute value and the derived attribute value must always change together, because the semantic equality rules between two resource types might not always agree and itā€™d be disruptive if the change to the upstream resource were ignored due to semantic equality but yet the downstream resource still gets updated.

1 Like