I am trying to set up servers in multiple zones, and i need to wait until they are all online and ready and have completed their cloud-init before continuing with setup trying to use remote-exec to perform additional setup. I can find nothing in the documents about being able to wait for events or check for them.
Any suggestions on how i can do this. The machines all need to be online so i can start tunning the remote-exec functions on them as they must communicate with eachother as the config is happening.
Terraform has no built-in primitive to react to events emitted by objects as they initialize. Since cloud-init and other boot-time actions are asynchronous and not exposed in a way Terraform can directly interrogate, this sort of setup will always require some custom solutions on your end to glue these parts together.
In an ideal world it’s best to avoid using remote-exec at all and instead set up the machines to somehow self-cluster during their own boot processes. The details of such a thing would of course vary depending on what software these servers are running, but the common general solution to achieve that is to pass to the machine via user_data some information that allows the boot scripts within the instance to automatically discover their neighbors somehow, such as by querying the DigitalOcean API to look for instances tagged in a particular way, or by having the machines explicitly register themselves in a service discovery system.
If that isn’t sufficient, the next best thing would be to have your individual servers still register themselves somewhere (which could just again be the normal DigitalOcean API and a predictable tagging scheme) and then have your final remote-exec step poll that location until everything is started up, failing after a certain amount of time if some servers never appear.
My final idea is a lot more brute-force, and that is to configure a remote-exec provisioner on each droplet that runs some script on the instance that just blocks until it detects that all of the asynchronous boot and initialization tasks are complete, using some logic written by you that is specialized to the system you’re deploying. Terraform will not consider a particular instance to be “complete” until all of its provisioners have concluded, so if you then have a downstream resource that depends on all of the droplets Terraform will wait until they have all completed their own provisioning before beginning work on that resource.
Ya thats what i was afraid of. Im already doing stuff with cloud-init to install some packages on all of the machines, but i need a way to have the master wait until all the other nodes finish before it continues. I like that service discovery idea, i think i will investigate consul