Should OneOf validators apply to Read results?

I’ve run into a somewhat philosophical conundrum that I don’t have a good answer to: If an attribute uses a OneOf validator, should the value returned by its Read() method always be one of the allowed OneOf values? Or is it okay for Read() return an attribute value that wouldn’t be valid for the user to specify themselves in the Terraform configuration?

To be more precise, I’m writing a resource with a state attribute. The underlying API has a fixed list of possible state values it can return for the resource. Several of these states are irrelevant from the resource’s point of view, because they’re transient; the only thing that can be done to move them out of these states is to wait. For example, there is a removing state, and it doesn’t really make sense for the Terraform resource to permit users to set state = "removing". Accordingly, I use the OneOf validator to restrict the possible values users can set state to in their Terraform configuration.

Should Read() return the actual underlying API value for state, and then let Update() handle taking the appropriate action for any given current state (such as waiting for a resource in the removing state to be removed)? Or should Read() only ever return a value for state that would be valid in the Terraform configuration, and wait for the resource to enter one of these valid states if it’s not currently in one?

Another example is the restarting state. It doesn’t make sense for the provider to allow the user to set state = "restarting", since it is also transient; supporting this would require restarting the resource on every terraform apply. If we Read() a resource with this state, should we return state = "restarting", allow Terraform to see that the configured state doesn’t match thus triggering an Update(), and then have Update() wait for the resource to move out of the restarting state? Or should Read() itself wait for the resource to move out of the restarting state and return the new state (running, for instance)?

A state attribute doesn’t sound like it would be set up for configuration by the user. Is it flagged with Optional: true or Required: true?

None of my Computed-only (non-user configurable) attributes have validators.

edit: I’ve read more carefully. It sounds like you have this attribute set for both Computed and Optional?

Allowing the user to configure a value which the API is going to change seems problematic even without factoring in validation.

I’m imagining valid user input states like running and paused with intermediate states init, starting, pausing and restarting.

Am I close?

What happens in Update() if Read() finds the attribute in one of the intermediate states?

Your edit is correct, the state attribute is both Computed and Optional, and your vision of how user input states and intermediate states work is also pretty much correct.

I didn’t want to get too into the weeds of my particular use case, but I’m writing a new Docker provider and would like to support as many of the scenarios that another existing provider supports as is feasible.

Possible Docker container states are created, running, paused, restarting, exited, removing, and dead (I’ve been unable to find a state diagram that includes every one of these). I’d like the user to be able to:

  • Start the container after creating it, and let Terraform ensure it stays running
  • Create the container but not start it, and use ignore_changes to allow the container to be started/stopped externally without Terraform interfering
  • Tie the container state to some other data source or resource so that Terraform can keep them synchronized
  • Create the container, start it, and wait for it to exit, and then retrieve the exit code and/or logs, and not run it again unless the resource is replaced
  • Create the container, start it, wait for it to exit, and then retrieve the exit code and/or logs and remove the container, and not create it again unless the resource is replaced

We can’t necessarily let the user directly set the Docker API state value though. For instance, what would the correct behavior be if they specified state = "created" and the container already exists in the running state? The only way to get back to the created state would be to stop the existing container, remove it, and create a new one. Similarly, if the user specified state = "removing", and the container did not exist, should the provider create a new container and then remove it to put it in the removing state? Probably not.

With these scenarios in mind, I decided to map the Docker API’s states to my own set of possible states: stopped, running, paused, exited, and removed.

stopped would map to both Docker’s created and exited states. If the user specifies state = "stopped" and the container does not exist, it will be created and assume the created state in Docker. If the container already exists in the created or exited state, the provider will do nothing. If the container already exists in the running state, the provider will stop it and it will assume the exited state in Docker.

running would map straightforwardly to Docker’s running state. If the container does not exist, it will be created and started, if it exists and is in the created or exited Docker states, it will be started, if it is in the paused Docker state it will be unpaused, etc. The paused state would behave similarly.

exited would map to Docker’s exited state, but will only wait for the container to exit as opposed to stopping it. If the user specifies state = "exited" and the container does not exist, it will be created, started, and the provider will wait for it to assume the exited state in Docker. If it exists in the created state, the provider will start it and wait for it to exit. If it exists in the running state, the provider will only wait for it to exit.

removed would behave similarly to exited, but would also remove the Docker container after it exits. If the container doesn’t exist in Docker but does exist in the Terraform state, the provider will do nothing (under the assumption that it has already run and been removed). If it doesn’t exist in Docker or in the Terraform state, the provider will create it, start it, wait for it to exit, then remove it.

To deal with intermediate states like restarting and removing, I think the correct thing to do is treat them as if they were their successor states (each of these only has one possible successor state: running and removed, respectively). So if Read() sees a container in the removing state, it will call RemoveResource() and cause Terraform to create a new container.

The main question for me is whether to return intermediate states from Read(), or pretend they don’t exist and only return their successor states; and if the latter, should the provider wait until the container has actually entered the successor state during the Read()?

Returning intermediate states from Read() and then waiting for the successor state in Update() seems to make the most sense to me, but then if the Update() fails for whatever reason, the Terraform state will contain an intermediate state value, which may not be a valid user-provided value. I’m just not sure whether this is a problem in practice.

If we instead waited for the container to transition out of intermediate states during the Read(), then it could possibly take a long time, and it feels unintuitive as a user. But Terraform would never see a state value that isn’t one of the valid possibilities specified in the OneOf validator, and the state reported by Terraform would always be correct at the time Read() finishes (and should remain correct, since it will never be a transient state).

We could instead take a middle road, where intermediate states are treated identically to their successor state during Read(), but we don’t wait for them to actually transition. If we see a container in the restarting state, then we write state = "running" to the Terraform state, and proceed accordingly. If an Update() is required because, for instance, the configuration changed from state = "running" to state = "paused", then Update() would wait for the container to leave the restarting state, and then pause the container. However, this would mean that the state of the container as reported by Terraform may not always be accurate, and Update() would need to re-read from the API to see if it needs to wait for a container in the restarting state to finish.

You said you want “the user to be able to” and then listed a bunch of imperative operations: create, start, etc…

In my adventures in writing a terraform provider for a not-very-CRUD-y API, I’ve found it very helpful to be deliberate about using declarative language: “Allow the user to define a container which will be started after creation and restarted as necessary”, etc…

Maybe this was already obvious to you? I had a bunch of false starts until I became a stickler about it.

Anyway…

Do you envision the container state as an attribute of the container resource, or will you make a separate container_state resource?

I’ve not explored what happens when a Computed attribute returns a value which would be rejected by a OneOf validator. Do you happen to know what happens here?

If the user specifies state = "stopped" and the container does not exist, it will be created and assume the created state in Docker. If the container already exists in the created or exited state, the provider will do nothing. If the container already exists in the running state, the provider will stop it and it will assume the exited state in Docker.

Which phase of terraform operation are we talking about here? The only time terraform should be creating anything is in Create(), and by definition “If the container already exists” cannot happen here.

Regarding intermediate states: I definitely wouldn’t return from Create(), Update() or Delete() until the final state is achieved (or an error is encountered).

If Read() returns the successor state, then Update() and Delete() will be required to handle those states gracefully (wait to do its work, etc…)

I’d be inclined to delay Read()'s return until things settle down.

Yes, I wasn’t very precise in my language, but you’re correct that I’m aiming to map these imperative ideas onto a strictly declarative resource, and the impedance mismatch here is substantial.

From my experiments, it seems like the OneOf validator (and likely validators in general) only apply to the configured attribute value, not to the read attribute value. This means that a user wouldn’t be able to set state = "restarting", but the resource could Read() a container and set its state to restarting itself. However, this would cause Terraform to trigger an Update() of the resource, since the read state wouldn’t match the configured/planned state. If any container resources depended on the state of another container resource (e.g. state = docker_container.example.state), they would only ever see its post-Update() value, which should always be a valid OneOf value, thus avoiding any problems. This makes me think that my initial idea was the right one: return the intermediate state from Read(), and then handle it in Update() or Delete() by waiting for the container to transition to the next state. The only issue I can see with this is that the semantics are a bit strange: if a container is restarting but its configured state is running, then Terraform will generate a plan which amounts to “do nothing until the container is running.”

In the quoted section, I was referring to “at terraform apply time,” regardless of whether that requires a Create() or an Update().

The main downside I see to returning the successor state in Read() is that Update() and Delete() may need to re-read the resource, to determine if they need to wait for the container to actually reach the successor state (since the Docker API will throw an error if one attempts to, for instance, pause a container that’s in the restarting state, but Terraform needs to be able to handle transitioning between any two states).

The upside is that Terraform will never generate a “do nothing and wait” plan; if the container is restarting, and the configured state is running, then Read() would just return running (the successor state of restarting), causing Terraform to report no changes needed, which intuitively feels more correct. However, it may be possible for a restarting container to transition directly to the exited state, if it reaches the maximum number of restarts defined in its restart policy, so we may not always be able to predict the successor state.

I suppose what it comes down to is whether Terraform should see intermediate states at all, or if the resource should only ever expose non-intermediate state values to the outside world, and treat intermediate state values as an implementation detail.

Thanks for sharing the results of your experiment with of the OneOf validator behavior. It’s what I’d have expected, but didn’t want to speculate.

I hadn’t considered that returning an intermediate state necessarily results in generation of a plan: Terraform will believe it needs to do some work.

I’d be leaning toward never returning an intermediate state, actually waiting until the resource progresses through intermediate states to stable states in Create, Read and Update. Probably Delete too.