Your edit is correct, the state attribute is both Computed and Optional, and your vision of how user input states and intermediate states work is also pretty much correct.
I didn’t want to get too into the weeds of my particular use case, but I’m writing a new Docker provider and would like to support as many of the scenarios that another existing provider supports as is feasible.
Possible Docker container states are created, running, paused, restarting, exited, removing, and dead (I’ve been unable to find a state diagram that includes every one of these). I’d like the user to be able to:
- Start the container after creating it, and let Terraform ensure it stays running
- Create the container but not start it, and use
ignore_changes to allow the container to be started/stopped externally without Terraform interfering
- Tie the container state to some other data source or resource so that Terraform can keep them synchronized
- Create the container, start it, and wait for it to exit, and then retrieve the exit code and/or logs, and not run it again unless the resource is replaced
- Create the container, start it, wait for it to exit, and then retrieve the exit code and/or logs and remove the container, and not create it again unless the resource is replaced
We can’t necessarily let the user directly set the Docker API state value though. For instance, what would the correct behavior be if they specified state = "created" and the container already exists in the running state? The only way to get back to the created state would be to stop the existing container, remove it, and create a new one. Similarly, if the user specified state = "removing", and the container did not exist, should the provider create a new container and then remove it to put it in the removing state? Probably not.
With these scenarios in mind, I decided to map the Docker API’s states to my own set of possible states: stopped, running, paused, exited, and removed.
stopped would map to both Docker’s created and exited states. If the user specifies state = "stopped" and the container does not exist, it will be created and assume the created state in Docker. If the container already exists in the created or exited state, the provider will do nothing. If the container already exists in the running state, the provider will stop it and it will assume the exited state in Docker.
running would map straightforwardly to Docker’s running state. If the container does not exist, it will be created and started, if it exists and is in the created or exited Docker states, it will be started, if it is in the paused Docker state it will be unpaused, etc. The paused state would behave similarly.
exited would map to Docker’s exited state, but will only wait for the container to exit as opposed to stopping it. If the user specifies state = "exited" and the container does not exist, it will be created, started, and the provider will wait for it to assume the exited state in Docker. If it exists in the created state, the provider will start it and wait for it to exit. If it exists in the running state, the provider will only wait for it to exit.
removed would behave similarly to exited, but would also remove the Docker container after it exits. If the container doesn’t exist in Docker but does exist in the Terraform state, the provider will do nothing (under the assumption that it has already run and been removed). If it doesn’t exist in Docker or in the Terraform state, the provider will create it, start it, wait for it to exit, then remove it.
To deal with intermediate states like restarting and removing, I think the correct thing to do is treat them as if they were their successor states (each of these only has one possible successor state: running and removed, respectively). So if Read() sees a container in the removing state, it will call RemoveResource() and cause Terraform to create a new container.
The main question for me is whether to return intermediate states from Read(), or pretend they don’t exist and only return their successor states; and if the latter, should the provider wait until the container has actually entered the successor state during the Read()?
Returning intermediate states from Read() and then waiting for the successor state in Update() seems to make the most sense to me, but then if the Update() fails for whatever reason, the Terraform state will contain an intermediate state value, which may not be a valid user-provided value. I’m just not sure whether this is a problem in practice.
If we instead waited for the container to transition out of intermediate states during the Read(), then it could possibly take a long time, and it feels unintuitive as a user. But Terraform would never see a state value that isn’t one of the valid possibilities specified in the OneOf validator, and the state reported by Terraform would always be correct at the time Read() finishes (and should remain correct, since it will never be a transient state).
We could instead take a middle road, where intermediate states are treated identically to their successor state during Read(), but we don’t wait for them to actually transition. If we see a container in the restarting state, then we write state = "running" to the Terraform state, and proceed accordingly. If an Update() is required because, for instance, the configuration changed from state = "running" to state = "paused", then Update() would wait for the container to leave the restarting state, and then pause the container. However, this would mean that the state of the container as reported by Terraform may not always be accurate, and Update() would need to re-read from the API to see if it needs to wait for a container in the restarting state to finish.