Get variable output from external data in terraform

Hello,

I’m trying to get the result of a consul bootstraping as such:

data "external" "bootstrap_consul" {
  program = ["ssh", "-o", "StrictHostKeyChecking=no", "root@${var.consul_servers.consul-0["ip_address"]}", <<EOF
  sleep 5 # give consul enough time to start
  token=$(consul acl bootstrap | grep -i secretid | awk '{print $2}')
  jq -n --arg token "$token" '{"token":$token}'
  EOF
  ]

  depends_on = [proxmox_vm_qemu.consul_sv]
}

output "consul_bootstrap_token" {
  value       = data.external.bootstrap_consul.result.token
  description = "Consul Bootstrap Token. Save it for later stages."
}

The issue is that the output is empty and I end up with this (log level trace):

Outputs:

consul_bootstrap_token = tomap({
  "token" = ""
})

I’m not really sure how I should go about doing this. I know there’s a very different solution using vault, but, on the one hand, I’m not using vault just yet, so I’m trying to provision the cluster as is so that I can test other things and then I can go back to a more proper automation and, on the other hand, I would really like to understand the parsing logic here.

Thanks!

Hi @lethargosapatheia,

In your example your output value is referring to data.external.bootstrap_consul.result.token, which must be a string, but the output value you showed is a map, so I think you generated that output with value = data.external.bootstrap_consul.result instead. Is that right?

If my assumption above explains why you have that map result then unfortunately I don’t think this is a Terraform question so much as a Consul, Grep, or Awk question, because all of the Terraform-related parts of your solution seem to be working: Terraform is clearly running the designated program and parsing its output, because otherwise it would not have known that the result contains an attribute named token.

Therefore I think the problem is likely to be in your subshell command line:

consul acl bootstrap | grep -i secretid | awk '{print $2}'

It seems like either grep is filtering out all of the lines of output from consul acl bootstrap, or the line it’s finding doesn’t have a $2 position so awk is printing the empty string.

I would suggest debugging this first outside of Terraform so you are not constrained by the external provider’s requirement to produce only valid JSON. That way you can potentially turn on some additional Bash options like -o pipefail (assuming you’re using Bash as your shell) and print out additional information to understand more about which part of this command line isn’t working.

I’m unfortunately not familiar enough with this part of Consul to know what sort of output is expected from consul acl bootstrap, so I don’t have any specific suggestions about what might be wrong with that subshell command line.

Hi, @apparentlymart,

Sorry for the mix-up. You’re seeing this, of course, perfectly correctly :slight_smile: I edited my post afterwards to include the .token part too, but I haven’t added the corresponding output, which is now:

Outputs:

consul_bootstrap_token = ""

Yeah, when running the command as such in the consul server, it outputs what I presume is what the terraform variable requires:

root@consul-0:/tmp# token=$(consul acl bootstrap | grep -i secretid | awk '{ print $2}')
root@consul-0:/tmp# jq -n --arg token "$token" '{"token":$token}'
{
  "token": "00349d32-34bb-0633-4fac-462975db69cc"
}

Can you confirm that this is what terraform expected by data.external.bootstrap_consul.result.token?
If that’s correct, I guess the issue is related more to the the ssh part and what this actually outputs. Rather difficult to get the syntax right, but I’ll have to work in that direction in any case.

The main problem, though, is that terraform simply won’t execute the ssh command, even if I know it does check and connect over ssh. I’ve simplified the command as such:

data "external" "bootstrap_consul" {
  program = ["ssh", "-r", "StrictHostKeyChecking=no", "root@10.16.16.100", "consul", "acl", "bootstrap"]

but the acl is not being bootstrapped. So the synta seems to be wrong somehow.

Hi @lethargosapatheia,

In your original example you were passing the entire script to run as an argument to the ssh client, which includes the step of running jq to produce the JSON output, so I think we can conclude that Terraform is running that command and the SSH request itself was successful, because otherwise the resulting output would not be valid JSON at all, because jq wouldn’t have run.

I think my best theory right now (keeping in mind that I’m not a Consul expert) is that despite your sleep 5 Consul still isn’t “ready enough” to run that command when Terraform is running it, but when you test manually on the server there has been more time for Consul to warm up and so it works.

If that is true, I think my next step would be to try to find a way to more reliably block until Consul is ready, or to write the script to politely retry until the acl bootstrap succeeds.

You may get a more detailed answer if you ask about this in the Consul category, where there are more Consul experts watching. They might not also be experts on Terraform, but hopefully if you can tell them that effectively what is happening is that Terraform runs the command shown immediately after starting the VM whose IP address is shown then that will be enough for them to understand what state Consul is likely to be in at that point, since I expect it’s immaterial whether it is Terraform or some other system that would be running that command, for the sake of debugging the behavior.

1 Like

I rewrote this reply I think around 3-4 times, as I kept testing it and changing my mind.

You were right :slight_smile: It was related to consul. Because I couldn’t see any actual output, I didn’t see where the problem was. Although in all other contexts I’ve used them, in the terraform context I forgot about setting the environmental variables which allowed me to access consul. These environmental variables I needed to set explicitly, because I’ve changed several settings in consul (disallowed http and the http port, setting the certificate paths and so on).
So this is the result:

data "external" "bootstrap_consul" {
  program = ["ssh", "-o", "StrictHostKeyChecking=no", "root@${var.consul_servers.consul-0["ip_address"]}", <<EOF
  export CONSUL_HTTP_ADDR=127.0.0.1:8501
  export CONSUL_HTTP_SSL=true
  export CONSUL_HTTP_SSL_VERIFY=true
  export CONSUL_CLIENT_CERT=/etc/consul.d/certs/host.crt
  export CONSUL_CLIENT_KEY=/etc/consul.d/certs/host.key
  export CONSUL_CAPATH=/etc/consul.d/certs/ca.crt
  token=$(consul acl bootstrap | grep -i secretid | awk '{print $2}')
  jq -n --arg token "$token" '{"token":$token}'
  EOF
  ]

  depends_on = [proxmox_vm_qemu.consul_sv]
}

Thank you, you’ve been of real help, because I kept looking in the right direction :slight_smile:

One more question: is there any way I could have seen the raw output of the command as such, without having match and format it in json?

The external data source is designed to integrate with specially-designed external software which generates JSON, so any output that isn’t JSON would be considered invalid.

However, there is a specific protocol for the external program to signal failure, which will then allow you to see an arbitrary failure message:

  • Print some messages to stderr
  • Exit with a non-successful exit status

Bash’s default behavior when encountering an error is to keep running subsequent commands and hope for the best, so unfortunately it’s not the most ideal language to implement integrations for the external provider, but in our docs example Processing JSON in shell scripts the script intentionally starts with set -e to cause Bash to fail immediately with an unsuccessful status if any intermediate command fails.

In your case I don’t think set -e alone would be sufficient because the command that might fail is in a pipeline, so you’d need to also add -x pipefail to get that one to fail in a useful way:

set -ex pipefail

With both of those options enabled, bash should fail if any of the intermediate commands fail. As long as those commands also print error messages to their stderr when they fail, these should come together to implement the error reporting protocol that the external data source is expecting.

Your pipeline is also in a subshell $( ... ) and so I think – but may be misremembering – that Bash will not actually raise an error if that pipeline fails by default. To catch that one it might be necessary to actually manually test the exit status variable $? using an if statement, at which point this is of course getting rather messy and may be better to switch to a different programming language to implement the integration.

A long, long time ago, before I worked on Terraform at HashiCorp and when I ran Terraform and Consul together in production, our variation of what you’re doing here was to include in the VM image a ready-to-run script for bootstrapping the Consul Agent, which then meant the command line to run it was relatively simple and the script could be written in a more robust programming language where it was easier to handle errors.

Note that if consul acl bootstrap is actually modifying Consul in some way then this is technically a bit of a misuse of the external data source, because data sources are supposed to be for reading data only, not for performing side-effects. If it’s working for you then I’m not going to tell you not to do it, but please do keep in mind that Terraform will re-run that script on every new plan because Terraform does not expect a data resource to do anything other than return some data.

1 Like

Yes, in $(...) bash opens a new shell, as far as I know. Getting a correct exit status from it can be an issue sometimes. I’m more or less familiar with that though.

If it’s working for you then I’m not going to tell you not to do it

I would actually prefer to be told what I should do, it would make things so much easier!
I didn’t know that that data external is supposed to be used just for reading data, but I knew it was a less than elegant hack.

If I were to embed the bootstrapping the script in the image (actually I could do it in cloud-init through terraform, it makes things easier and I don’t need to burn it into the image - I’m using packer if that’s of any relevance), then I’d have the token somehwere in the virtual machine and the read it with data external - I guess that would be your approach more or less?
And after everything is working and I’ve set the proper tokens, I can revoke the root token.

I’m sure there are numerous ways to do this with different tradeoffs, but I can at least tell you the way we had it working in the previous employer I mentioned:

  • Our VMs all ran custom machine images built using HashiCorp Packer. We started from a distribution-provided base image and added a few special things to it, one of which was a script which we could run as a command line something like /usr/local/bin/join-consul --servers="10.1.2.1 10.1.2.2 10.1.2.3" (and I think there were some other arguments too, like “datacenter”, but I don’t remember all of the details and I don’t think they are super important for this story anyway).
  • In any Terraform configuration that declared a VM that needed to join the Consul cluster, we used a data-only module to retrieve the list of Consul servers for a particular target datacenter.
  • When declaring the VM (in our case, aws_instance or aws_autoscaling_group depending on the situation) we set user_data to cause cloud-init (which was built into our VM images) to run a simple script that just ran the preinstalled join-consul command I showed above, interpolating in the Consul server addresses and other arguments.
  • Once a machine launched, cloud-init would run during its boot process and thereby run the join-consul script.

Now I think in your case you are bootstrapping Consul servers rather than agents (clients), so this pattern above isn’t exactly right for you, but I think the general idea of including a script already in the VM and then making it run on boot is still relevant.

In your case, as you mentioned, there is the additional challenge of retrieving the token for use in Terraform. If you absolutely need the token in your Terraform configuration then there isn’t really any way to avoid using something like a data source to achieve that, but perhaps you could separate the bootstrapping step (done using cloud-init on boot) from the token retrieval step (done using a data source) so that the token retrieval is side-effect-free. That does, however, assume that there is some way to retrieve a token from Consul after it’s done “acl bootstrap”, which I don’t remember enough to confirm or deny! (Or, as you say, save the token to disk somewhere on the Consul server, but that makes me a bit nervous.)

If possible I would prefer to avoid Terraform handling that token at all, and instead arrange for the Consul servers to self-bootstrap on boot and directly register the token into an external shared secret store so that other system can retrieve it asynchronously later. In that case, you can use access control offered by the secret store to control who can retrieve the token, and for some secret stores (such as HashiCorp Vault) you can separate the long-lived “root token” from shorter-lived client tokens, so that each client will retrieve its own private token.

I’m describing something I haven’t thought about for nearly six years so please take this with an appropriate amount of skepticism! I’m not suggesting that you should just take exactly what I’ve described here and follow it exactly, but hopefully it helps to think about some other approaches.

1 Like