Terraform stucks when instance_count is more than 2 while using remote-exec provisioner

  • I am trying to provision multiple Windows EC2 instance with Terraform’s remote-exec provisioner using null_resource.
    $ terraform -v
    Terraform v0.12.6
    provider.aws v2.23.0
    provider.null v2.1.2

  • Originally, I was working with three remote-exec provisioners (Two of them involved rebooting the instance) without null_resource and for a single instance , everything worked absolutely fine.

  • I then needed to increase the count and based on several links, ended up using null_resource. So, I have reduced the issue to the point where I am not even able to run one remote-exec provisioner for more than 2 Windows EC2 instances using null_resource.

//VARIABLES

variable "aws_access_key" {
  default = "AK"
}
variable "aws_secret_key" {
  default = "SAK"
}
variable "instance_count" {
  default = "3"
}
variable "username" {
  default = "Administrator"
}
variable "admin_password" {
  default = "Password"
}
variable "instance_name" {
  default = "Testing"
}
variable "vpc_id" {
  default = "vpc-id"
}

//PROVIDERS
provider "aws" {
  access_key = "${var.aws_access_key}"
  secret_key = "${var.aws_secret_key}"
  region     = "ap-southeast-2"
}

//RESOURCES
resource "aws_instance" "ec2instance" {
  count         = "${var.instance_count}"
  ami           = "Windows AMI"
  instance_type = "t2.xlarge"
  key_name      = "ec2_key"
  subnet_id     = "subnet-id"
  vpc_security_group_ids = ["${aws_security_group.ec2instance-sg.id}"]
  tags = {
    Name = "${var.instance_name}-${count.index}"
  }
}

resource "null_resource" "nullresource" {
  count = "${var.instance_count}"
  connection {
    type     = "winrm"
    host     = "${element(aws_instance.ec2instance.*.private_ip, count.index)}"
    user     = "${var.username}"
    password = "${var.admin_password}"
    timeout  = "10m"
  }
   provisioner "remote-exec" {
     inline = [
       "powershell.exe Write-Host Instance_No=${count.index}"
     ]
   }
//   provisioner "local-exec" {
//     command = "powershell.exe Write-Host Instance_No=${count.index}"
//   }
//   provisioner "file" {
//       source      = "testscript"
//       destination = "D:/testscript"
//   }
}
resource "aws_security_group" "ec2instance-sg" {
  name        = "${var.instance_name}-sg"
  vpc_id      = "${var.vpc_id}"


//   RDP
  ingress {
    from_port   = 3389
    to_port     = 3389
    protocol    = "tcp"
    cidr_blocks = ["CIDR"]
    }

//   WinRM access from the machine running TF to the instance
  ingress {
    from_port   = 5985
    to_port     = 5985
    protocol    = "tcp"
    cidr_blocks = ["CIDR"]
    }

  tags = {
    Name        = "${var.instance_name}-sg"
  }

}
//OUTPUTS
output "private_ip" {
  value = "${aws_instance.ec2instance.*.private_ip}"
}
  • If I set the count of the instance to 1. All the above provisioning steps work fine.

However, if I set the count of the instance to anything other than 1 (e.g. 2), Terraform consistently runs all the provisioning steps on both the instances however runs the LAST step (Write-Host THIRD) on ONLY ONE of the instances.

Observations:

  • With one remote-exec provisioner, it works fine if count is set to 1 or 2. With count 3, it’s unpredictable that all the provisioners will run everytime on all the instances. However one thing is for sure that Terraform never completes and does not show the output variables. It keeps showing “null_resource.nullresource[count.index]: Still creating…”
  • For the local-exec provisioner - Everything works fine. Tested with count’s value as 1, 2 and 7.
  • For file provisioner its working fine for 1, 2 and 3 however does not finish for 7 but the file was copied on all the 7 instances. It keeps showing “null_resource.nullresource[count.index]: Still creating…”
  • Also, in every attempt, remote-exec provisioner is able to connect to the instances irrespective of count’s value and it’s just that, it’s doesnt trigger the inline command and randomly chooses to skip that and starts showing “Still creating…” message.
  • I have been stuck with this issue for quite some time now. Couldnt find anything significant in debug logs as well. I know Terraform is not recommended to be used as a config mgmt tool however, everything’s working fine even with complex provisioning scripts if the instance count is just 1 (Even without null_resource) which indicates that it should be easily possible for Terraform to handle such a basic provisioning requirement.

TF_DEBUG Logs:

Any pointers will be greatly appreciated! :smiley:

I downgraded the version to v11.14 and that magically worked. Seems like a bug in v0.12.6.
Please refer to the comments here for more information.