Timeout waiting for ip address

I’m having limited success in deploying Linux (RHEL 7.9) VM’s using Terraform vsphere provider. I’m getting a ‘timeout waiting of ip’ in Terraform, and generally happens only with a single VM when deploying 2 or more. So, if I deploy 1 VM it works fine, but if I deploy more than 1 VM then 1 of them will fail. I can monitor vCenter and the typical behavior for a VM goes something like this:

  1. The new vms all get provsioned in vsphere as expected using a simple RHEL7.9 template (with latest open-vmware-tools package installed). At this time I also see events in vCenter saying that the machines are being created, powered-on, customized and
  2. During the very first boot of each machine, I get a boot message (on the VM’s console) that the hostname is being set. Once this happens the VM(s) reboot.
    3a. For machines that work as expected, the reboot happens and the VM boots back up and is ready to go.
    -OR-
    3b. The one machine that exhibits this issue the reboot fails. Vcenter reports that the machine is still powered-on, but the VM’s console is ‘dead’ (black screen, unresponsive to keyboard). At this point the machine must be manually reset (vcenter actions->Power->Reset) in order for anything else to happen. Once reset the machine will boot up completely as expected.
  3. Terraform reports that the machines are built. If the failed machine in step 3b above is not reset before the build timeout, Terraform reports the ‘Timeout waiting for ip’ error.

How do I toubleshoot this? It seems strange that always 1 machine of the batch fails to reboot. I do not have any similar issues building Windows vm’s.

Update, in hopes that somebody else is facing a similar issue. What I’ve discovered is that the VM customization step is working and the IP address is assigned, but for some reason the VM was not rebooting correctly (specificially not catching the SIGINT 15 and shutting down completely). Since the machine never booted back up with the new IP address, the error ‘timeout waiting for IP address’ is issued. I worked around this by adding a self-disabling service that detects if the SIGINT 15 was missed and rebooting. I haven’t had an issue with a system no rebooting. I think this is most likely a timing issue between vsphere and the open-vmware-tools running on the machine. I hope this helps somebody!