I have a module that launches our AWS hosts and configures via Cloud-Init (CI).
We’ve been experiencing some failures with CI and we initially added a remote-exec provisioner to the instance to wait for CI to finish and the status would cause a pass/fail in TF.
Unfortunately that caused a deadlock, our CI needed the volumes attached to complete, and TF couldn’t attach the volumes while it was waiting on CI to complete.
Our solution was to use a null_resource and this works for all of our use cases with one side-effect.
If the null_resource fails, it doesn’t taint the instance. So the next TF run, the null_resource is re-created and fails because the [broken] host still exists. This causes problems in our pipeline.
Is there a way I can taint the instance if the null_resource fails?
resource "aws_instance" "instance" {
count = var.instance_count
ami = var.ami_id
instance_type = var.instance_type
user_data = var.user_data
iam_instance_profile = var.iam_instance_profile
// omitted many attributes to save space
}
resource "aws_volume_attachment" "volume_attachment" {
count = var.volume_ids == null ? 0 : length(var.volume_ids)
skip_destroy = true
instance_id = element(aws_instance.instance.*.id, count.index)
volume_id = element(var.volume_ids, count.index)
device_name = var.device_name
}
resource "null_resource" "cloud_init_status" {
count = var.bot_key_pem != null ? var.instance_count : 0
triggers = {
instance_id = element(aws_instance.instance.*.id, count.index)
}
provisioner "remote-exec" {
inline = [
"echo \"Running cloud-init status --wait > /dev/null\"",
"sudo cloud-init status --wait > /dev/null",
"sudo cloud-init status --long"
]
connection {
user = "bot"
host = element(aws_instance.instance.*.private_ip, count.index)
timeout = var.ssh_timeout
private_key = var.bot_key_pem
bastion_host = var.bastion_host
}
}
}