Terraform AWS ASG launch and terminate EC2 frequently

Issue

ASG cannot work. EC2 instances are launched and then terminated frequently once “terraform apply” finishes, even though there is no network traffic.

ASG activities shows below:

Expect

I expect the ASG could work and EC2 instances could scale in and out normally.

Environment

  • terraform 1.14.3

  • terraform-aws-modules/alb/aws 9.4.0

  • AWS Provider: hashicorp/aws v6.27.0

  • The others for version is the latest.

  • OS type that terraform runs on is CentOS 7.

Analysis

  1. The ALB configurations for target group and ASG configurations are as following. Theoretically, after the EC2 instances startup successfully, ALB would check the path /app1/index.html for health, frequency is 30s.
# Terraform AWS Application Load Balancer (ALB)
module "alb" {
  source  = "terraform-aws-modules/alb/aws"
  # DONOT use v10.4.0, there will be much more changes for the rules and targets.
  # TODO
  version = "9.4.0"

  name = "${local.name}-alb"
  load_balancer_type = "application"
  vpc_id = module.vpc.vpc_id
  subnets = module.vpc.public_subnets
  security_groups = [module.loadbalancer_sg.security_group_id]

  # For example only
  enable_deletion_protection = false

# Listeners
  listeners = {
    # Listener-1: my-http-https-redirect
    my-http-https-redirect = {
      port     = 80
      protocol = "HTTP"
      redirect = {
        port        = "443"
        protocol    = "HTTPS"
        status_code = "HTTP_301"
      }    
    }# End my-http-https-redirect Listener

    # Listener-2: my-https-listener
    my-https-listener = {
      port                        = 443
      protocol                    = "HTTPS"
      ssl_policy                  = "ELBSecurityPolicy-TLS13-1-2-Res-2021-06"
      certificate_arn             = module.acm.acm_certificate_arn

       # Fixed Response for Root Context 
       fixed_response = {
        content_type = "text/plain"
        message_body = "Fixed Static message - for Root Context"
        status_code  = "200"
      }# End of Fixed Response

      # Load Balancer Rules
      rules = {
        # Rule-1: myapp1-rule
        myapp1-rule = {
          actions = [{
            type = "weighted-forward"
            target_groups = [
              {
                target_group_key = "mytg1"
                weight           = 1
              }
            ]
            stickiness = {
              enabled  = true
              duration = 3600
            }
          }]
          conditions = [{
            path_pattern = {
              values = ["/*"]
            }
          }]
        }# End of myapp1-rule
      }# End Rules Block
    }# End my-https-listener Block
  }# End Listeners Block

# Target Groups
  target_groups = {
  # Target Group-1: mytg1
   mytg1 = {
      # VERY IMPORTANT: We will create aws_lb_target_group_attachment resource separately when we use create_attachment = false, refer above GitHub issue URL.
      ## Github ISSUE: <https://github.com/terraform-aws-modules/terraform-aws-alb/issues/316>
      ## Search for "create_attachment" to jump to that Github issue solution
      create_attachment = false
      name_prefix                       = "mytg1-"
      protocol                          = "HTTP"
      port                              = 80
      target_type                       = "instance"
      deregistration_delay              = 10
      load_balancing_cross_zone_enabled = false
      protocol_version = "HTTP1"
      health_check = {
        enabled             = true
        interval            = 30
        path                = "/app1/index.html"
        port                = "traffic-port"
        healthy_threshold   = 3
        unhealthy_threshold = 3
        timeout             = 6
        protocol            = "HTTP"
        matcher             = "200-399"
      }# End of Health Check Block
      tags = local.common_tags # Target Group Tags 
    } # END of Target Group-1: mytg1

  } # END OF target_groups
  tags = local.common_tags # ALB Tags
}# End of alb module


resource "aws_autoscaling_group" "my_asg" {
  name_prefix = "myasg-"
  desired_capacity = 2
  min_size = 1
  max_size = 4
  vpc_zone_identifier = module.vpc.private_subnets

  target_group_arns = [ module.alb.target_groups["mytg1"].arn ]
  health_check_type = "EC2"
  health_check_grace_period = 300

  launch_template {
    id = aws_launch_template.my_launch_template.id
    version = aws_launch_template.my_launch_template.latest_version
  }

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 50
    }
    triggers = ["desired_capacity"]
  }
  tag {
    key = "Owner"
    value = "Web-Team"
    propagate_at_launch = true
  }
  
}

  1. The app1 in private ec2 will cost about 30 seconds to startup. I configure health_check_grace_period = 300 in ASG, so I think EC2 instances have enough time to startup, ASG also would wait for 300s before terminate the ec2 instances.

  2. The whole project is attached, please feel free to reference it.

Ask for Help

I searched the aws documents but get little experience for the issue.

I have much passions for cloud infrastructure and I am also junior.

Could you please help me for this issue?

Thank you so much.

TBH this is not really a provider issue but a general AWS issue. You would probably get better support if you open an AWS support case, or try to troubleshoot yourself by manually reproducing and reviewing the issue in the AWS Management Console.

Looking at whatever information you provided, it just seems like instances provisioned by ASG keeps failing EC2 health checks. Maybe you can isolate the problem by provisioning a one-off EC2 instance from your launch template and see if it even starts properly. This could be caused by many types of issues like insufficient disk space, insufficient KMS permissions for encrypted volumes, or bad user data.

Aside from that, you probably should use ELB instead of EC2 as the health check type.

Good luck.