Parse HCL treating variables or functions as raw strings hashicorp/hcl

Hi,

I’m using the official HCL Go module https://pkg.go.dev/github.com/hashicorp/hcl/v2@v2.3.0/gohcl to parse some Terraform files that contain Terraform GitHub Provider configuration. I didn’t find a way to parse attributes as raw strings without being forced to provide the decoder with a Context that knows how to resolve any function or variable referenced.

I was wondering if it is possible to use https://pkg.go.dev/github.com/hashicorp/hcl/v2@v2.3.0/gohcl?tab=doc#DecodeBody with a nil Context and therefore be able to use raw values instead of having to interpret things like ${locals.example.[count.index]}.

The goal is to be able to rewrite the .tf files from a script removing a resource if it matches a given pattern.

Sorry for posting it in Terraform subforum, I didn’t find a specific one for HCL questions.
Kind regards!

Hi @jimen0,

Since your goal is to edit configuration, I think the best way to get this done is using the hclwrite package, which is designed to allow you to work directly with HCL native syntax constructs. gohcl, by contrast, is for decoding HCL data into normal Go values, which requires expression evaluation as you’ve seen. Also, reading data into gohcl and then writing it out again will be lossy, because the Go data structures can’t preserve details like the ordering of the arguments in the input, any comments that are present, etc.

This hclwrite package isn’t as mature as HCL’s main parser/decoder because none of the major applications using HCL 2 have needed it yet, but I did personally use it in a side-project related to Terraform and it seems to be broadly working, notwithstanding some edge-cases you can see in the issues in that repository at the time of writing.

I think you could do what you want here with the following steps using hclwrite:

  • Read one .tf file using hclwrite.ParseConfig, producing an hclwrite.File object.
  • Use the Body method on that object to obtain the root body of the file. (That is, the construct containing the top-level blocks.)
  • Iterate over the result of the Blocks method to visit each of the blocks in the file in turn:
    • Use Block.Type() to get the block type, and continue if the result is anything other than "resource", since you said you want to remove resources.
    • Use Block.Labels() to get the labels for the block, which for a Terraform resource block will be a pair of strings: the resource type name and the resource name. (If you want to be robust against invalid input here, you should check first to make sure the result has length 2.)
    • Check whether the two labels match the pattern you are looking for. If not, continue.
    • Call Body.RemoveBlock on the main body, passing in the current block, to remove it.
  • Finally, call File.Bytes() on the original file (now modified in-place by the work so far) to obtain the modified source code. You could perhaps check if it’s different than what you originally loaded and overwrite the original file with the new content if so.

If you want to process an entire module, you can repeat the above for each .tf file in a directory. The side-project I mentioned earlier might serve as a good starting-point, because it also works with all of the .tf files in a particular directory and so maybe you can just replace the cleanBody function with your own logic like the above to get something working quickly.

The hclwrite package’s types encapsulate the raw tokens that an input file is built from and provider a higher-level API for manipulating it, which has the advantage that if you don’t modify a particular body at all then it will be preserved exactly, aside from whitespace changes caused by the pretty-printer. In particular, it will preserve any comments in the unmodified regions, and retain the ordering of attributes within the blocks, both of which would usually be discarded during normal HCL parsing.

1 Like

Just built a small proof of concept using hclwrite in like 20 minutes and it is simpler and less error prone that my previous approach.
Thank you so much, @apparentlymart for the detailed explanation, the link to a real world example and the recommendations.

Thank you! We can close this thread as resolved now :smiley:

@apparentlymart thank you for the detailed write up here.

My team is trying to compute a deterministic hash of the HCL attributes of a Block.
One issue we’re running into is that the structs in the hclsyntax package track the source range which changes if we add a new line above a block.

We would like to get the name of each attribute and some deterministic undecoded value.

Is this possible with functions exported in any of the hcl packages?

Hi @sourcec0de,

HCL doesn’t have such a capability natively, but I imagine you could derive the result you want by writing your hash function to ignore the source ranges.

Hashing of the contents of a block is not a capability HCL was designed to offer, and not something I would expect to generalize well across different use-cases. By writing the hashing logic yourself, you can tailor your implementation to make your hash sensitive to changes you deem significant and insensitive to changes you don’t deem significant, which will likely include making some decisions that make sense for your current problem but would not make sense for another problem, even if both problems are ultimately handled by hashing.

hclwrite may again be the best starting point here, if it’s the expression source code you are intending to hash, rather than the results of evaluating the expressions. For example, you could write a function that takes an hclwrite.Body called body and does the following pseudocode steps:

  • For each element attr in the map returned by body.Attributes():
    • Call attr.Expr().BuildTokens() to obtain a representation of the expression as tokens, in a slice called tokens.
    • For each token in tokens, take the Type and Bytes fields, and discard SpacesBefore. Use the result to compute a hash h of the attribute’s expression.
    • Record the expression hash against the attribute’s name, which is the element key in the attributes map, to describe the attribute for hashing purposes.
  • For each nested block block in the slice returned by body.Blocks():
    • Recursively run this process on the body returned by block.Body() to obtain a hash of the nested body.
    • Record that hash along with the block.Type() and block.Labels() results to describe the block for hashing purposes.
  • Hash your unordered set of attributes and ordered sequence of blocks to produce a final single hash covering the entire body.

The above makes a few assumptions:

  • It works only for HCL native syntax: if you intend to support JSON syntax too then there can be no robust hashing solution unless your hashing function knows the body schema of the body it’s attempting to hash, because the JSON syntax is ambiguous without schema.
  • As noted above, it uses the raw tokens of an expression as the data to hash. For example, that means that changing 1 + 1 to 1 + (1) would cause a hash change even though the result of the expression is unchanged.
  • It treats attributes as unordered and blocks as ordered, which is the same assumption made by HCL’s low-level decoding API. Your application may prefer to consider certain block types as not having significant block order, such as in Terraform’s case where a top-level resource block is identified only by its labels and not by its position in relation to other resource blocks.
2 Likes

@apparentlymart this is an awesome explanation.
It was access to the HCL source and a bit of guidance that we were hoping for.

This is perfect.
Thank you! :pray: