Improving output in automation

jorhett · January 30, 2021, 3:52am

This PR has been around for a while. Any chance it could be considered for 0.15?

github.com/hashicorp/terraform

Skip 10-second periodic updates in automated workflows

hashicorp:main ← jorhett:minimize_output_in_automation

opened 08:30PM - 10 Dec 20 UTC

jorhett

+71 -0

Makes use of the existing `TF_IN_AUTOMATION` environment variable to suppress 10…-second updates during command execution added in #6163 which create extremely long, unhelpful logs in automated CI/CD workflows. Provides relief for needs of #18317 without adding any new configuration options, a goal mentioned by @apparentlymart there

Makes use of the existing TF_IN_AUTOMATION environment variable to suppress 10-second updates during command execution added in #6163 which create extremely long, unhelpful logs in automated CI/CD workflows.

Provides relief for needs of #18317 without adding any new configuration options, a goal mentioned by @apparentlymart

jorhett · February 26, 2021, 9:12pm

TF dev team has rejected it again… along with every other PR that attempts to make the output more useful. I guess nothing is more important than forcing humans to store and read through thousands of lines of content-less garbage

apparentlymart · February 27, 2021, 1:07am

Hi @jorhett,

For my part I’m sorry that in my earlier communications I was unclear as to what we’d require to make a change here. I’m going to try to elaborate here just for the sake of explaining myself, but I understand that you’re frustrated and thus may not be willing to engage with this further, and so I’m not meaning this to suggest that you ought to do anything or ought to have done anything, just to give some more context.

When I say that something like this should be implemented without adding any additional options, I don’t mean that we should find another existing option to attach it to but rather that we should design a new behavior that will unconditionally replace the old behavior and will better meet the design goals, by attempting to strike a better compromise between the different constraints.

In this case, it seems like the most significant conflicting requirements are:

I infer from the PR that originally implemented these periodic announcements that this was intended to address a situation where Terraform can appear stuck/hung for particularly long-running operations such as creating an Amazon RDS instance. That PR was from before I joined the Terraform team and so I don’t know what specifically prompted it, but I have to trust that Mitchell was doing this in response to some feedback that a long silence from Terraform during an operation was disconcerting, possibly leading to folks killing/interrupting the process.
From your feedback here (and other similar feedback) we can see that there are situations where a notification every 10 seconds per resource instance can produce an excessive amount of notification messages, which can be particularly frustrating if you are already familiar with how long operations take for a particular configuration and so don’t need the reassurance that Terraform is “still working”.

The design challenge here, then, is to find a compromise that makes things tolerable (though likely not ideal) for both of these constituents, so that we can at least partially address various problems without creating various competing codepaths that would be a drag on future maintenence.

One simple change I could imagine here to start would be to extend the notification timeout to happen less often than every 10 seconds. I don’t know what prompted Mitchell to select 10 seconds in his original proposal here, but I suspect it was a somewhat-arbitrary number chosen based on experience with how long commonly-used operations of that time took. With the benefit of a few more years of Terraform experience, we can try to determine what is a “typical” length of time for an operation and try to set this timeout so that it’s more likely that it will appear only for the most egregious cases (like RDS) and even in those cases will generate less output, while still producing new output somewhat often to provide the comfort that Terraform hasn’t got stuck.

I can also imagine some other designs that would likely strike a better compromise between the two goals but at the expense of a more complicated implementation. For example, the inferred requirement of periodically generating some output periodically for reassurance doesn’t seem like it requires per-operation status updates. Perhaps the UI layer could roll up all of the pending operations into a single periodic message like “3 operations still in progress…”, and thus reduce the total number of bytes written out while still giving the feedback that I expect the original change was aiming for.

Typically we start on something like this by doing research and development work rather than immediately writing code. Research can of course involve writing prototype code, but we typically don’t engage with final implementation until we’ve arrived at a design proposal that we believe strikes the best compromise between the competing goals. In my earlier responses on this topic I was intending to communicate that we’d be interested in that research but wouldn’t be able to prioritize it yet.

This sort of open-ended iterative research prior to having a concrete proposal is, in our experience, not something that external contributors can typically commit to, and frankly we don’t really have a great process for collaborating with folks outside of HashiCorp on design work of this kind even if they did want to. It is something I personally have wanted to improve for some time, but I’m sorry to say that it’s just another example of there only being so many hours in a day, and so we have to make some tough decisions about what to work on first.

I would love to spend time on researching and discussing different behaviors that better meet these needs, but right now everyone on the team is focused elsewhere and so regrettably this is a problem that needs to wait for later.

With all of that said, I understand and recognize your frustration, and I understand that the above will likely not make you any less frustrated, but I hope it will at least give you some context to understand the Terraform team behaviors you’ve observed.

jorhett · March 5, 2021, 11:05pm

Frustrated but desiring of a fix and willing to work on it.

Context that is greatly appreciated. I think the problem here is that trying to read the tea-leaves of the various “we won’t do this because” kindof run in a circle, and leave one without a direction forward. In fast, it’s even more confusing when you say you could do something that you previous said you’re not willing to do:

For my and probably all CI purposes with after-the-fact logs, no display of these useless-if-you-aren’t-watching-the-screen messages is ever useful.

It’s really and truly unclear exactly what goals complete with --dont-spam-me It would really help if you (== TF dev team) could explain why that’s so abhorrent. Because I can assure you, nobody else knows why this is a problem. Incredibly useful flag to solve a huge problem.

You say that people won’t commit to design-oriented work, but I don’t think that’s true. I can assure you I’ve lost more than 300 hours just writing (and fixing) wraparound filters for each version of Terraform. In contrast, designing a proper fix would be a lot less work.

I believe the real problem is that you are holding your cards to your chest. All of these proposals aren’t good enough, but you don’t say what good enough is. You say there is competing goals, but you don’t say what those are either. All we get are these conflicting refusals with no details upon which to do a design.

What makes even less sense is that you (still TF dev team) did this complete reimplementation of the output views which seems to me from the outside totally capable of tackling the need for different output displays… but you don’t think it can be solved by that and you aren’t telling us why.

Seriously @apparentlymart, if you were to explain the needs and the constraints, you might be shocked to find one or more of the thousands of people coping with this just might give you enough of a design proposal to make it worth your while.

Topic		Replies	Views
How can I automate creating variables.tf, outputs.tf, and .tfvars data? Or what is an efficient workflow? Terraform	2	2145	March 20, 2022
Code Review request Terraform	1	252	September 1, 2021
Terraform 0.15.0-rc2 released Terraform	2	534	April 14, 2021
Provide default behavior for TFE outputs and variables HCP Terraform hcp-terraform , terraform-enterprise	1	349	September 18, 2023
Reduce lines written by CDKTF cli CDK for Terraform	2	357	June 28, 2022

Improving output in automation

Related topics