A short simplified description of my setup:
Let’s say I have a SFN State machine with 2 states, A,B
. Each of them use an AWS Batch job with a certain revision like so:
A -> jobA:3
B -> jobB:4
This State machine is being run on a cron schedule and might take a few hours to complete.
When a deployment via CI/CD happens, the following things may occur:
- the docker image for
jobB
changes, thus the container_properties
for the job change
- terraform will force a new resource, causing
jobB:4
to be marked as INACTIVE
and a new revision jobB:5
to be submitted
- The sfn state machine now is:
A -> jobA:3
B -> jobB:5
If a SFN is running while this deployment is being made:
- the SFN definition includes the old revision of jobB
- state A finishes, SFN will try to queue
jobB:4
causing an error
Is there a way of preventing this? I know that terraform’s default behaviour for forcing a new resource is “delete old resource, create a new one”, but in the case of AWS Batch where you have revisions, it would be nice if there were a way to preserve the old revisions of a job without marking them as inactive
Run terraform apply to change resources when you want them changed. This sounds like a glib answer, but how can terraform choose behavior at a logical level above resources, so-called orchestration logic?
I wasn’t saying it should be aware of logical level resources, I was suggesting that perhaps for resources that have revisions(like AWS Batch) it can have a flag to not automatically de-register a revision when “forcing” a new resource
All resources may still be in-use according to a user’s assessment. For example, if I update the value of ami
for an aws_instance
, then run terraform apply
, this change destroys any existing instance and creates a new one from the new image. Had important processes been running on the instance that I wished to keep running, then I should have waited to apply the change until I have finished with the process. A process running in a aws_sfn_state_machine
is the same idea.
Terraform plan shows the changes to be applied to the resources to converge the state of those resources to the semantic descriptions expressed in the template. The change to the docker image, in this case, requires a new resource to be created in order to achieve convergence. Terraform reports this constraint in the output of terraform plan
. The action of terraform apply
converges the state of the resources as planned.