Custom create-before-destroy strategy

What would be the starting point for implementing a custom create-before-destroy strategy? Is there a ready to use API exposed to providers, or would this require a deeper change in Terraform itself?

Use case: Changing an aws_network_acl_rule results in a destroy+recreate operation, which can cause a brief network outage. Forcing create-before-destroy won’t work, since the ACL rule numbers cannot be the same. A custom strategy could copy the old rule to a new one (with rule number + 1, say), destroy the old rule, add the new one in its place, and then delete the temporary rule again. There are probably other resources that could benefit from custom strategies, too.

Hi @sveniu!

The two “replace” operations (destroy old then create replacement, or create replacement and then destroy old) are managed by Terraform Core itself for situations where the provider (or, in turn, the remote system) isn’t able to model a change as an in-place update. From the provider’s perspective, it appears as two separate operations, and so by declaring that a change “requires replacement” (ForceNew in the SDK) the provider is asking Terraform Core to split the change into two separate operations which it will handle separately.

For that reason, there isn’t any concept of a provider handling replacement itself. The closest thing possible in the current lifecycle model is for the provider to present the change as an in-place update, populate the planned result appropriately using CustomizeDiff to reflect everything that might change as part of the action, and then do the steps you’re describing in its Update implementation.

That would not be appropriate if the change in question could potentially cause an interruption of service or disturb the state of other objects, but is acceptable if the provider can present the effect of it being in-place. I’m not familiar enough with Network ACL rules to know for certain if that is true here, but from what you described it sounds like downtime is avoidable by careful sequencing.

However, it seems like aws_network_acl_rule is a particularly tricky case because the rule number is part of the configuration rather than something the provider or remote API chooses. Automatically choosing the given rule number +1 as you suggested could work, but presumably that number could be in conflict too.

Perhaps a reasonable compromise would be to add an optional new argument for a temporary replacement rule number, and then make the provider present the change as an in-place update only if that argument is set. The user would then be opting in to this behavior that might otherwise be surprising, and can select for themselves an appropriate temporary number that isn’t already in use by another rule. If that new argument were not set then the provider would retain the current behavior of presenting it as requiring replacement.

If that seems like a reasonable compromise to you, I think a good next step would be to open a Feature Request on the AWS provider repository to propose the idea and see what the provider development team thinks of it. There may well be extenuating circumstances with this particular resource type that make this strategy inappropriate, but the provider development team are the best folks to weigh in on that.

1 Like

That’s great info, thank you for the detailed answer! I’ve filed a feature request for now.