Reuse the same resources in multiple environments (bastion host ec2, SG)

Hi All,

I have two environments (dev, qa) with their own set of respective resources (similar). However, in my dev environment, I have setup a single bastion host (EC2 in public subnet), which I want to reuse by my QA resources as well. I don’t want to setup a separate bastion host for QA, rather i want to share it for the other environment (dev, QA) as well.

Both the environments and all their resources are in the same AWS VPN.

I have a single set of Terraform stack whereas I maintain environments’ specific “terraform.tfvars” to provision the resources for each environment, through gitlab CICD pipeline (dey pipeline, qa pipeline).

I have a bool variable “deploy_bastion”, whose value i set to “true” in the dev environment (terraform.tfvars), and “false” in the QA, respectively.

I do terraform apply in my dev pipeline, where i provision all the resources including bastion host. Then i do terraform apply for my QA environment, but here i dont want bastion host resources to be recreated and also i don’t want a new bastion host SG.

I am handling the setting up of the bastion host through the count parameter.

here is my code for “bastion-host-autoscaling.tf”

########################################
resource “aws_launch_configuration” “bastion-host” {

  ##
  count           = var.deploy_bastion ? 1 : 0

  name_prefix     = var.bastion_host_launch_configuration_name
  image_id        = var.amis[var.aws_region]
  instance_type   = var.bastion_host_instance_type
  key_name        = aws_key_pair.public_key.key_name
  security_groups = [aws_security_group.bastion-host[count.index].id]
}

resource "aws_autoscaling_group" "bastion-host" {

  ##
  count      = var.deploy_bastion ? 1 : 0
  
  name                      = var.bastion_host_autoscaling_group_name
  vpc_zone_identifier       = [var.x_eks_public_subnet_1, var.x_eks_public_subnet_2]
  launch_configuration      = aws_launch_configuration.bastion-host[count.index].name
  min_size                  = var.deploy_bastion ? 1 : 0
  max_size                  = var.deploy_bastion ? 2 : 0
  health_check_grace_period = 300
  health_check_type         = "EC2"
  force_delete              = true

  tag {
    key                 = "Name"
    value               = var.bastion_host_autoscaling_group_tag_name
    propagate_at_launch = true
  }
}

########################################

and here is my code for “bastion-host-autoscalingpolicy.tf”

########################################

# scale up alarm

resource "aws_autoscaling_policy" "bastion-host-cpu-policy" {

  ##
  count      = var.deploy_bastion ? 1 : 0

  name                   = "bastion-host-xxx-cpu-policy"
  autoscaling_group_name = aws_autoscaling_group.bastion-host[count.index].name
  adjustment_type        = "ChangeInCapacity"
  scaling_adjustment     = "1"
  cooldown               = "300"
  policy_type            = "SimpleScaling"
}

resource "aws_cloudwatch_metric_alarm" "bastion-host-cpu-alarm" {

  ##
  count               = var.deploy_bastion ? 1 : 0

  alarm_name          = "bastion-host-x-cpu-alarm"
  alarm_description   = "bastion-host-x-cpu-alarm"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods  = "2"
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = "120"
  statistic           = "Average"
  threshold           = "30"

  dimensions = {
    "AutoScalingGroupName" = aws_autoscaling_group.bastion-host[count.index].name
  }

  actions_enabled = true
  alarm_actions   = [aws_autoscaling_policy.bastion-host-cpu-policy[count.index].arn]
}

# scale down alarm
resource "aws_autoscaling_policy" "bastion-host-cpu-policy-scaledown" {

  ##
  count      = var.deploy_bastion ? 1 : 0

  name                   = "bastion-host-x-cpu-policy-scaledown"
  autoscaling_group_name = aws_autoscaling_group.bastion-host[count.index].name
  adjustment_type        = "ChangeInCapacity"
  scaling_adjustment     = "-1"
  cooldown               = "300"
  policy_type            = "SimpleScaling"
}

resource "aws_cloudwatch_metric_alarm" "bastion-host-cpu-alarm-scaledown" {
  ##
  count      = var.deploy_bastion ? 1 : 0

  alarm_name          = "bastion-host-x-cpu-alarm-scaledown"
  alarm_description   = "bastion-host-x-cpu-alarm-scaledown"
  comparison_operator = "LessThanOrEqualToThreshold"
  evaluation_periods  = "2"
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = "120"
  statistic           = "Average"
  threshold           = "5"

  dimensions = {
    "AutoScalingGroupName" = aws_autoscaling_group.bastion-host[count.index].name
  }

  actions_enabled = true
  alarm_actions   = [aws_autoscaling_policy.bastion-host-cpu-policy-scaledown[count.index].arn]
}

########################################

I think the above code is fine, however, the problem comes, when I apply the SecurityGroups.tf (see below)

########################################

resource "aws_security_group" "bastion-host" {
  ##
  count      = var.deploy_bastion ? 1 : 0

  vpc_id      = var.x_eks_dev_vpc
  name        = var.bastion_host_security_group_name
  description = var.bastion_host_security_group_description
  egress {
from_port   = 0
to_port     = 0
protocol    = "-1"
cidr_blocks = ["0.0.0.0/0"]
description = "all internet"
  }

  ingress {
from_port   = 22
to_port     = 22
protocol    = "tcp"
cidr_blocks = [var.xxx, var.xxx]
description = "xxx-proxy"
  }
  tags = {
Name = var.bastion_host_security_group_tag_name
  }
}

resource "aws_security_group" "server-host" {
  
  ##
  #count = length(aws_security_group.bastion-host)

  vpc_id      = var.x_eks_dev_vpc
  name        = var.server_host_security_group_name
  description = var.server_host_security_group_description
  
  ingress {
from_port       = 22
to_port         = 22
protocol        = "tcp"
security_groups = [aws_security_group.bastion-host.id] # allowing access from our bastion-host-x-instance
self        = true
  }
  ingress {
from_port       = 80
to_port         = 80
protocol        = "tcp"
#cidr_blocks = [var.xxx, var.xxx] # load balance tbd
cidr_blocks = [var.k8s_workernode_subnet_range_1, var.k8s_workernode_subnet_range_2,var.k8s_workernode_subnet_range_3]
security_groups = [aws_security_group.elb-server-host.id]
self        = true
  }
  ingress {
from_port       = 443
to_port         = 443
protocol        = "tcp"
cidr_blocks = [var.k8s_workernode_subnet_range_1, var.k8s_workernode_subnet_range_2,var.k8s_workernode_subnet_range_3]
security_groups = [aws_security_group.elb-server-host.id]
self        = true
  }
  ingress {
from_port       = 9142
to_port         = 9142
protocol        = "tcp"
cidr_blocks = [var.k8s_workernode_subnet_range_1, var.k8s_workernode_subnet_range_2,var.k8s_workernode_subnet_range_3]
security_groups = [aws_security_group.elb-server-host.id]
self        = true
  }

  egress {
from_port   = 0
to_port     = 0
protocol    = "-1"
cidr_blocks = ["0.0.0.0/0"]
self        = true
  }
  tags = {
Name = var.server_host_security_group_tag_name
  }
}

########################################

I am trying to stop the creation of SG the second time for QA environment, see the count inside …“aws_security_group” “bastion-host” {…

however, then the problem comes in the resource “aws_security_group” “server-host” {…"
which i do want to provision separately for the QA environment, but want to reuse the SG of the bastion-host which was created during the dev environment.

This “#count = length(aws_security_group.bastion-host)” in the “server-host” above makes no sense, because i do want to create it separately for the QA environment, whereas bastion-host count is 0 in this case. Also this makes no sense to me as well if I put in the above code:
“security_groups = [aws_security_group.bastion-host[count.index].id]”.

What i am trying to do somehow is to use here the SG id of the “dev” bastion host while setting the SG of QA “server-host”.

Now how can i reference in here (QA run) the dev SG bastion Id is my problem…
security_groups = [aws_security_group.bastion-host.id] # DEV?

Or by looking at the above entire code, let me know if I am approaching the problem in a right way? and if yes, how can i fix this last riddle.

Thanks.

Typically this sort of thing would be addressed by using Terraform’s support for ‘remote state’, which allows one Terraform configuration to refer to resources created by a separate configuration (it can also serve other purposes, but this usage is well documented).

With that in place, you’d have separate Terraform configurations (or workspaces, I guess, if you want to use those) for DEV and QA, and the QA configuration would have remote resource references to the parts of the DEV configuration that you want to reuse.

thanks for replying.

i tried this in my securitygroup.tf:

##
data "terraform_remote_state" "vpc" {
    backend = "s3"
    config = {
        bucket  = "gitlab-runner-terraform-state-eu-central-1-xxxx"
        key     = "xxx/development/terraform.state"
        region  = "eu-central-1"
    }
}
##

resource "aws_security_group" "server-host" {
     
  vpc_id      = var.xxx_eks_dev_vpc
  name        = var.server_host_security_group_name
  description = var.server_host_security_group_description
  
  ingress {
    from_port       = 22
    to_port         = 22
    protocol        = "tcp"
    security_groups = [data.terraform_remote_state.vpc.outputs.bastion_host_id] 
    self        = true
  }
  ingress {..

and in my output.tf

output "bastion_host_id" {
    value = "${aws_security_group.bastion-host[0].id}"
}

it worked, however, i have to test various scenarios to see if it works all the time.

I destroyed both the environments (dev and qa) and then started provisioning again.

At the plan stage for both the environments, i get this error:

$ terraform plan -lock=false -input=false -var "tag_git_repo_url=$GIT_REPO_URL" -var "tag_git_branch=$GIT_BRANCH" -var "tag_build_id=$BUILD_ID" -var "tag_timestamp_utc=$UTC_NOW" -out=$TF_PLAN
61 Refreshing Terraform state in-memory prior to plan...
62 The refreshed state will be used to calculate this plan, but will not be
63 persisted to local or remote state storage.
64 data.template_file.init-script: Refreshing state...
65 data.template_file.shell-script: Refreshing state...
66 data.template_cloudinit_config.cloudinit: Refreshing state...
67 data.terraform_remote_state.vpc: Refreshing state...
68 ------------------------------------------------------------------------
69 Error: Unsupported attribute
70   on securitygroup.tf line 50, in resource "aws_security_group" "server-host":
71   50:     security_groups = [data.terraform_remote_state.vpc.outputs.bastion_host_id] # allowing access from our bastion-host-xxx-instance
72     |----------------
73     | data.terraform_remote_state.vpc.outputs is object with no attributes
74 This object does not have an attribute named "bastion_host_id".
78 ERROR: Job failed: exit code 1 

my understanding is that the error is because in the beginning, there is no value available for the SG in the output hence it can’t be set.

If i change in my securitygroup.tf inside the resource,resource “aws_security_group” “server-host” {

the value for:
security_groups = aws_security_group.server-host[0].id

it works for the development stage.

Then afterwards, when i change the code to

security_groups = [data.terraform_remote_state.vpc.outputs.bastion_host_id]

and apply the QA stage, this works as well. Obviously at this time, the value is available in the output after the development is applied/provisioned.

The point is i have to change the code after dev apply and before qa apply is impractical.

How can i keep the same code and first apply dev and then qa?

the approach above is a bit strange.

the other approach i could think of is to simply put the:

cidr_blocks with IPs in the server-host resource instead of SG of the bastion host.