CLI: get command: Is this idempotent? Why does it rebuild from unaltered source?

Is the CDKTF CLI’s get command intended to be idempotent? If so, then why does it rebuild already-built components regardless of whether their source code has changed?

I have novice-level familiarity with Terraform and am doing some evaluation of use of its CDK with Go.

Initially, upon running cdktf get on a sample project, I believed it to be failing, as the “downloading and generating modules and providers…” message would spin for several minutes before I gave up and terminated it.

Curious what was happening, and seeing only very terse output, I tried running with --verbose, which was ignored. I looked at the inbuilt help, saw the --log-level option, suspected it might log to standard out or standard error, tried running with --log-level=INFO, --log-level=debug and such, to no avail, then read more closely, saw the note, “Only supported via setting the env CDKTF_LOG_LEVEL”, set that environment variable, discovered that something was happening, gave it time to complete, and found that this operation, even with a simple project, on a very powerful machine with 64 GB of RAM, required six minutes to complete.

I later found discussion on this forum indicating that this operation, even to build a single provider, might take an hour or two for Go.

Once I understood that “get” in this context is a bit of a misnomer, as it results in building/compiling/creating one or more components rather than simply fetching/retrieving/getting them, I expected that a subsequent run with the same source would complete much more quickly. In my experience, a build operation is typically idempotent–repeat runs under the same circumstances do not typically result in any changes. Typically, something like make is used to recognize that a given component has already been built, and to refrain from rebuilding it if its source has not changed. Such is apparently not the case with `cdktf build, as my repeat runs, seconds after a previous one, take roughly the same amount to complete as the first run took, and I assume that whatever source code the utility may retrieve from remote repositories is not changing on this frequency.

I understand that CDKTF users will likely someday be able to avoid the need for building Go providers locally. Until then, I hope to better understand what is happening, and ultimately how to determine in an automated fashion if and when to “re-get” (to regenerate CDK Constructs for Terraform providers and modules).

I did some related web searches and searched this forum, but I’ve found no specifically-related documentation or discussion. I read the CDK constructs and cdkt get command docs. I’ve not looked at cdktf’s source code.

As you’ve observed, cdktf get does download and regenerate code every time it is ran. There is an open issue aimed to improve this and only do that process when needed.

There is also an open issue about improving the performance of the operation.

Finally, the naming of the command has also been discussed.

The team has spent some time on these issues, but has been more focused on functional pieces. I believe more time will be spent on this topic in the next couple of releases as it is definitely a pain point for use.

Thank you for the pointers to related information, @jsteinich. I reviewed them and find that they provide useful context.

I am still left to wonder 1) if the current behavior indicates a bug or that cdktf get works as designed, 2) if cdktf get works as designed then what is the rationale for rebuilding every time it is run, and 3) how a machine can determine on any given run after the first if get-ing is or is not warranted.

My current theory is that cdktf rebuilds every time simply to avoid the need to determine if rebuilding is warranted or not, and thus that it is not currently feasible for a machine to determine if re-running the get operation, as one might hope to arrange as part of an automated build pipeline, is warranted.

Upon initial consideration, this suggests to me that factors outside of one’s control (e.g., by pinning provider versions) may affect the outcome of the get operation. I’m still learning CDKTF, but that seems concerning.

Summary of related findings from @jsteinich’s references:

#791 - Improve performance of cdktf get

In enhancement request #791, opened 2021-06-23, the requester asserts that under some circumstances, at least, an automated build pipeline using CDKTF must perform the get operation on every run:

When using cdktf synth as part of a CI/CD pipeline, it is currently required to either run cdktf get on every pipeline run, or cache the imports folder in some way.

This person also provides information relevant to the question of why the result of cdktf get is not cached (automatically by cdktf or as a workaround for cdktf’s overzealous rebuilding?):

the build environment might not be identical and there doesn’t seem to be an efficient way to check for updated providers at this point

There is a fair amount of discussion in response, including a suggested alternative by @jsteinich, which is available for some languages but not for others, for cases in which one is using only a small number of providers: use pre-built provider packages. None of it refutes the assertions I quoted above.

#578 - add “update” option to cdktf get

Implementation of enhancement request #578, opened 2021-02-20, seems wise.

The request is:

Currently cdktf get wipes the “imports” directory, and re-downloads everything. It would be good if it instead would only get needed packages (and version updates). It might make sense to still allow a complete “wipe-and-download” option, for example by adding a -force parameter or similar.

It is restated in a comment by the requester as follows:

The behavoir today is: reglardless of the module versions specified in cdktf.json matching or not maching whatever’s installed in the imports directory, the contents of imports gets wiped every time cdktf get is run. I wish cdktf was more smarter so that it only performed the “wipe/update” if needed.

#209 - Rename get to import and .gen to imports to align with cdk8s and AWS CDK

Enhancement request #209, opened 2020-07-19, is, in short, just what the issue title indicates. I support linguistic parity with related or adjacent utilities, but setting aside cdk8s’ and AWS CDK’s use of the terms, I think that both “get” and “import” imply the retrieval, acquisition, transfer, or copying of one or more things, not the resolution of dependencies followed by retrieval and compilation of source code.

I would say that it does work as initially designed.

I believe you are correct there. It’s non-trivial to know whether or not rebuilding is necessary.

Not necessarily a complete list, but some of the pieces that go into it:

  • cdktf versions (current and last used to generate)
  • provider version (current and last used to generate)

I don’t believe the cdktf version is anywhere in the generated code, but that seems relatively straightforward to add.
The exact provider version used isn’t currently being output into the generated code (unless exact version requested). There is an open issue about this, but what Terraform has cached also plays into this. Could use/generated Terraform lock files to pin the version even if not directly pinned.

When pinning an exact provider version I believe everything should match up.

I completely agree. Feel free to give it a thumbs up.

Is there a different verb that you feel better represents the command? Perhaps “generate”?

A meta issue for these topics is tentatively planned for the next release. Feel free to leave comments there.

The inbuilt help for CDKTF’s CLI command describes its get subcommand as, “Generate CDK Constructs for Terraform providers and modules.” Assuming that summary is accurate, “generate-constructs” comes immediately to mind as a potential name for a subcommand that does this. Just “generate” by itself would work well if cdktf was generally used for creating and/or working with constructs, or maybe if constructs are the only thing that gets generated in the context of CDKTF.

Elsewhere in the CDKTF docs, under Concepts >> Modules >> Generate Module Bindings, I find indication that the get operation “creates the appropriate module bindings in the ./.gen directory for you to use in your application.” Are module bindings a subset of constructs for providers and modules?

And elsewhere, Create and Deploy >> Configuration File >> Declare Providers and Modules, the get commend is described, “You must declare all of the providers and modules that require code bindings in your cdktf.json file. CDKTF generates these code bindings from cdktf.json when you run cdktf get.)” Are code bindings constructs for providers and modules?

Most of cdktf’s other subcommands (7 of 12) appear to apply to stacks: deploy, destroy, diff, list, synth, watch, and output. This might leave one to suspect that a subcommand “generate” would generate a stack, potentially from the project that was created/initialized via cdktf init.

Presumably, this decision was made long ago, but a multi-level command structure might work well. Are there other actions that might affect constructs? There could be cdktf constructs build, cdktf constructs list, etc.