I’m experimenting with writing an Ephemeral resource which makes available a secret string with a limited shelf life (think: API token)
I’d assumed that “renewing” an ephemeral value meant refreshing it. We’d respond with a new/valid value, but the Renew() method cannot update the ephemeral values:
Renew cannot return new result data for the ephemeral resource instance, so this logic is only appropriate for remote objects like HashiCorp Vault leases, which can be renewed without changing their data.
I discovered this only after failing to find a Result struct in ephemeral.RenewResponse. Bummer.
Since I cannot “renew” an expired API token value into a not-expired state, I’m tempted to use Renew() just for raising diagnostics.
Open() will set RenewAt to a time shortly before expiration, and stash the true expiration time in private state.
If Renew() is called before the true expiration time, it will raise a diagnostic warning (“value expires soon!”) and then set RenewAt() to the actual expiration time.
If Renew() is called after the true expiration time, it will raise a diagnostic error (“value has expired”)
My concern here revolves around a very long terraform apply which makes use of the ephemeral value twice: Once very early in the apply, and then again after many minutes (hours?) have elapsed.
The idea is that while the provider cannot correct the expired secret, it can provide feedback and encourage the user to create multiple instances of the ephemeral resource so that each instance’s Open() is called shortly before it is needed, making expiration less likely.
So…
Am I on a reasonable path? Other/better suggestions for handling this situation would be great.
Does the strategy of raising diagnostic Warnings/Errors via Renew() sound reasonable?
Is my understanding correct about Open()? It is called immediately (ish) before invoking the thing which needs the ephemeral value?
Thanks!
edit: Testing indicates that #3 does not work out the way I’d expected. The docs:
when an ephemeral resource’s data is needed, Terraform calls the provider OpenEphemeralResource RPC
I’d been leaning on that “when”, thinking that it meant that Open() would be invoked only when the result was actionable. In reality, it’s called immediately, and it’s called regardless of whether the ephemeral value is referenced anywhere in the configuration, even it it’s never “needed”.
From Terraform’s perspective, the OpenEphemeralResource call is made at the moment the value is needed, but you may be misunderstanding when that value is needed by the configuration. Terraform doesn’t create a new version for each reference to the resource in the configuration, the ephemeral resource must only be evaluated one time during each phase of operation, and the result must remain consistent throughout that operation.
Terraform expects the data within a single operation to be internally consistent, so if an ephemeral resource first returned "Token A", but then a later dependency saw "Token B", the inconsistent data could lead to evaluation errors. Changing tokens would also inherently produce logical race conditions, since Terraform would not be able to predict if the token is valid long enough for any particular operation, it could expire and change during the apply operation of dependent resources.
RenewEphemeralResource is always called as closed to the requested renewal time as possible, and the provider should account for any variation in call times and system delays by incorporating whatever extra buffer is reasonable for the type of resource. Diagnostics will be returned at the end of execution, so if this is during apply, you then have to contend with changes that have already been made. Asking the user to make structural changes to the configuration at that point is more risky, since that could be impacting things which were already applied. You also have to contend with the fact that you don’t know what is an acceptable structure for that user, and the user may not either, a config which is close to “timing out” every time won’t be noticed until it passes that unknown threshold.
If the actual token itself cannot be renewed, but rather a different value is required, then a more complex system might need to be built around it. That would require a secure data store where a stable token could be substituted for the current updated token. this still falls under the same race-condition problems, but at least here you have prior knowledge about how the systems work and could incorporate overlapping expirations that are adequate for your use cases.
Would you mind expanding a bit on “the OpenEphemeralResource call is made at the moment the value is needed”?
When is that moment, exactly?
I’m surprised to find that a terraform project containing only an ephemeral resource calls OpenEphemeralResource and CloseEphemeralResource during terraform plan. Was the value ever needed with nothing referring to it?
Consider this configuration:
ephemeral "rapidly_expiring_credential" "a" {
id = "a"
}
resource "takes_forever_to_provision" "b" {
credential = rapidly_expiring_credential.a.value
}
ephemeral "rapidly_expiring_credential" "c" {
id = "c"
}
resource "some_other_thing" "d" {
credential = ephemeral.rapidly_expiring_credential.c.value
some_attr = takes_forever_to_provision.b.some_attr
}
In my experiments OpenEphemeralResource is called immediately for both a and c, even though c is only needed by d, which is blocked until creation of b completes.
The docs (“when the data is needed”) and your comment (“the moment the value is needed”) suggest that this is not the intended behavior.
Or I have misunderstood
Perhaps “From Terraform’s perspective” is the critical element here? There’s some need for the value on the other side of the plugin API that I’m not taking into account?
So Terraform determines the evaluation order based on dependencies, and so what it’s doing is technically evaluating resources as soon as possible, which is when all their dependencies have been fulfilled. In this case a and c have no dependency, so are both evaluated immediately in this small example. Because we’re walking through the graph based on dependencies, there’s no way to signal that c is required by d, but “not right away, wait for this other thing to almost be done”.
With a simple chain of dependencies “as soon as possible” and “only when needed” are really the same thing. It’s when there are cross-chain dependencies like in your example that timing may not be optimal, but optimal timing is not Terraform’s goal, correctness is (plus since managed resources consume some non-zero amount of time, “as soon as possible” tends to be the most efficient in aggregate).
In practice though, Terraform is operating in a highly concurrent environment, with lots of external factors, and the differences in possible timings in evaluation when there may or may be huge differences in actual execution times means you always need to account for these delays regardless.