I’m performing terraform apply, that destroys and creates a new ec2 instance.
The instance that gets destroyed has a shutdown script that takes several minutes to complete in order to gracefully shut down running software.
It seems that normal machine reboots and cycles properly fire up said script. However, when instance gets destroyed and re-created there are signs that during destruction the machine did not properly shut down.
My shutdown script listens for these events:
WantedBy=halt.target reboot.target shutdown.target
Does terraform fire these events and await graceful ec2 shutdown before destruction? How can I make sure terraform apply allows my machine gracefully shut itself down before destruction?
I looked at those properties but they seem like timeouts that terraform uses for it’s own purposes - to know when an operation has failed. As opposed to giving a set amount of time to the instance itself for shutdown.
Terraform’s AWS provider implements destroying an individual aws_instance instance by calling ec2:TerminateInstances and then polling periodically until the instance status shows as “terminated” as far as the EC2 API is concerned.
Terraform has no direct control over how EC2 implements that shutdown, how the software inside the EC2 instance responds to being asked to shut down, or how long EC2 will wait for the shutdown to complete.
Elsewhere in the EC2 docs, there is another section What Happens When You Terminate an Instance, which explains that TerminateInstances causes the EC2 system to send an ACPI Shutdown event (similar to what happens when you press a power button on a physical computer) which software in the instance must listen for and respond to. In your case it sounds like you are using systemd, in which case it’s systemd that would respond to that event, as you described. Although it’s impossible to say for certain what’s going on with your system from here, my first theory would be that the systemd configuration isn’t quite right and so systemd is not running the script as you intended.
While not directly related to your question, I want to note that I’d recommend using aws_autoscaling_group to launch EC2 instances from Terraform rather than aws_instance directly. In that case, Terraform simply configures EC2 autoscaling and then autoscaling in turn manages your instances. This is helpful in many situations because EC2 autoscaling can then constantly monitor your instances and replace them if any fail, whereas Terraform can only react to changing infrastructure when you explicitly run it.
Although it’s impossible to say for certain what’s going on with your system from here, my first theory would be that the systemd configuration isn’t quite right and so systemd is not running the script as you intended.
Can you elaborate on this any further? Maybe with a link to proper implementation example?
This is my shutdown service:
[Unit]
Description=Gracefully shut down remnode to avoid database dirty flag
DefaultDependencies=no
Before=shutdown.target reboot.target halt.target
[Service]
Type=oneshot
ExecStart=/root/node_shutdown.sh
[Install]
WantedBy=halt.target reboot.target shutdown.target
and this is the script it calls
#!/bin/bash
remnode_pid=$(pgrep remnode)
if [ -n "$(ps -p $remnode_pid -o pid=)" ]; then
kill -SIGINT $remnode_pid
fi
while [ -n "$(ps -p $remnode_pid -o pid=)" ]
do
sleep 1
done
I’m not knowledgeable enough about systemd to give a definitive answer here, but some quick searching showed various examples of using services with ExecStop set on them pointing to a script that systemd would run when shutting down that service. It looks like you can just set ExecStop without also setting ExecStart. I don’t know if that will work, but hopefully it’s relatively easy to try and see!