How to prevent Batch downtime when updating the tags/image of compute environment

jqwerty · October 8, 2025, 6:36pm

Hi, everyone

I have an AWS infrastructure defined by Terraform. However, when the tags or image of its Batch compute environment get updated, Terraform replaces (i.e. destroys and recreates) the compute environment, which causes downtime in Batch. Details:

Context

An AWS infrastructure defined by Terraform.
The infrastructure includes Batch compute environment and job queue that is mapped to the compute environment.
The default tags are applied to both compute environment and job queue.
The image of the compute environment is set to a specific AMI defined in tfvars.json file.

Problem

When tags or AMI of the compute environment get updated, Terraform destroys and recreates the compute environment.
When Terraform destroys the compute environment, it needs to be detached from a job queue. Thus, today, we destroy the entire job queue prior to terraform apply if the compute environment needs to be replaced.
This causes Batch downtime, which is critical since there are scheduled jobs to be run in Batch.

Idea: Have two pairs of Batch compute environment and job queue (blue-green deployment)

Description: During an upgrade of the infrastructure, have Terraform only update one pair, while the other pair handles scheduled Batch jobs
Limitation: This requires lifecycle.ignore_changes attribute of aws_batch_compute_environment to be set dynamically (i.e. when updating the blue pair, set lifecycle.ignore_changes attribute of green compute environment to all). However, lifecycle.ignore_changes only takes static expressions. Otherwise, Terraform fails with A static list expression is required. error

How can we prevent Batch downtime when the Batch compute environment and job queue need to be updated with new tags and the AMI of the compute environment needs to be updated? Any suggestion/idea would be appreciated. Thanks!

disagreebalance · October 23, 2025, 5:25am

Stumbled upon this from Google. Facing the same issue but I’m using CDK so not a Terraform user. Would be interested to hear how you solved this, or otherwise +1

Topic		Replies	Views
#AWS#terraform: Updating compute environment Terraform	0	435	October 1, 2019
Error: error deleting Batch Compute Environment : Cannot delete, found existing JobQueue relationship AWS	6	6941	February 9, 2023
AWS Batch and AWS Step functions broken by terraform apply AWS	3	1232	October 8, 2019
AWS Batch compute environment needs recreating after launch template change AWS	0	949	September 30, 2020
Lifecycle block throws error Terraform	4	2811	February 25, 2020

How to prevent Batch downtime when updating the tags/image of compute environment

Related topics