Terraform AWS ec2 doesn't gracefully shut down

sergchernata · October 8, 2019, 4:20pm

I’m performing terraform apply, that destroys and creates a new ec2 instance.

The instance that gets destroyed has a shutdown script that takes several minutes to complete in order to gracefully shut down running software.

It seems that normal machine reboots and cycles properly fire up said script. However, when instance gets destroyed and re-created there are signs that during destruction the machine did not properly shut down.

My shutdown script listens for these events:
WantedBy=halt.target reboot.target shutdown.target

Does terraform fire these events and await graceful ec2 shutdown before destruction? How can I make sure terraform apply allows my machine gracefully shut itself down before destruction?

kirklatslalom · October 9, 2019, 7:45pm

Have you tried adjusting the timeouts?

Timeouts

The timeouts block allows you to specify timeouts for certain actions:

create - (Defaults to 10 mins) Used when launching the instance (until it reaches the initial running state)
update - (Defaults to 10 mins) Used when stopping and starting the instance when necessary during update - e.g. when changing instance type
delete - (Defaults to 20 mins) Used when terminating the instance

sergchernata · October 9, 2019, 7:58pm

I looked at those properties but they seem like timeouts that terraform uses for it’s own purposes - to know when an operation has failed. As opposed to giving a set amount of time to the instance itself for shutdown.

apparentlymart · October 9, 2019, 10:09pm

Terraform’s AWS provider implements destroying an individual aws_instance instance by calling ec2:TerminateInstances and then polling periodically until the instance status shows as “terminated” as far as the EC2 API is concerned.

Terraform has no direct control over how EC2 implements that shutdown, how the software inside the EC2 instance responds to being asked to shut down, or how long EC2 will wait for the shutdown to complete.

The EC2 guide Troubleshooting Terminating (Shutting Down) Your Instance suggests that EC2 will give the instance an opportunity to run shutdown scripts before the instance is finally forcefully terminated.

Elsewhere in the EC2 docs, there is another section What Happens When You Terminate an Instance, which explains that TerminateInstances causes the EC2 system to send an ACPI Shutdown event (similar to what happens when you press a power button on a physical computer) which software in the instance must listen for and respond to. In your case it sounds like you are using systemd, in which case it’s systemd that would respond to that event, as you described. Although it’s impossible to say for certain what’s going on with your system from here, my first theory would be that the systemd configuration isn’t quite right and so systemd is not running the script as you intended.

While not directly related to your question, I want to note that I’d recommend using aws_autoscaling_group to launch EC2 instances from Terraform rather than aws_instance directly. In that case, Terraform simply configures EC2 autoscaling and then autoscaling in turn manages your instances. This is helpful in many situations because EC2 autoscaling can then constantly monitor your instances and replace them if any fail, whereas Terraform can only react to changing infrastructure when you explicitly run it.

sergchernata · October 9, 2019, 10:20pm

Although it’s impossible to say for certain what’s going on with your system from here, my first theory would be that the systemd configuration isn’t quite right and so systemd is not running the script as you intended.

Can you elaborate on this any further? Maybe with a link to proper implementation example?

This is my shutdown service:

[Unit]
Description=Gracefully shut down remnode to avoid database dirty flag
DefaultDependencies=no
Before=shutdown.target reboot.target halt.target

[Service]
Type=oneshot
ExecStart=/root/node_shutdown.sh

[Install]
WantedBy=halt.target reboot.target shutdown.target

and this is the script it calls

#!/bin/bash
remnode_pid=$(pgrep remnode)

if [ -n "$(ps -p $remnode_pid -o pid=)" ]; then
    kill -SIGINT $remnode_pid
fi

while [ -n "$(ps -p $remnode_pid -o pid=)" ]
do
    sleep 1
done

apparentlymart · October 9, 2019, 10:27pm

I’m not knowledgeable enough about systemd to give a definitive answer here, but some quick searching showed various examples of using services with ExecStop set on them pointing to a script that systemd would run when shutting down that service. It looks like you can just set ExecStop without also setting ExecStart. I don’t know if that will work, but hopefully it’s relatively easy to try and see!

Topic		Replies	Views
EC2 Fleet takes longer than usual to get destroyed AWS	0	527	May 3, 2021
Disable_api_termination not honored on destroy of AWS EC2 instance AWS	3	1887	June 13, 2023
Local-exec script is not shutdown gracefully Terraform	1	25	April 21, 2025
Optimizing terraform + aws to deploy and destroy 4000+ instances Terraform	3	3863	June 11, 2021
Timeout setting for terraform binary Terraform	1	2365	September 17, 2021

Terraform AWS ec2 doesn't gracefully shut down

Timeouts

Related topics