This post can serve as a point of discussion for #9032 Add aws_security_group_rules resource
on terraform-provider-aws.
I’ll begin by excerpting a portion of @bflad very in-depth response with a summary of the issue.
Summary
To begin, here is a summary this issue in a Terraform configuration from my understanding. Please let me know if this is incorrect. While the below only shows ingress for brevity, egress also has the same issue.
resource "aws_security_group" "a" {
name = "a"
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
source_security_group_id = aws_security_group.b.id
}
}
resource "aws_security_group" "b" {
name = "b"
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
source_security_group_id = aws_security_group.a.id
}
}
Effectively, the desire is to allow each of the EC2 Security Groups to cross-communicate. However, when this configuration is applied, Terraform will return a cycle error since both resources reference each other.
The current recommended guidance on this situation is to switch from using ingress/egress configuration blocks in the aws_security_group resource as shown above, to the below usage of only defining ingress and/or egress rules via the aws_security_group_rule resource (no ingress/egress configuration blocks in the aws_security_group resource):
resource "aws_security_group" "a" {
name = "a"
}
resource "aws_security_group_rule" "a_from_b" {
security_group_id = aws_security_group.a.id
type = "ingress"
from_port = 22
to_port = 22
protocol = "tcp"
source_security_group_id = aws_security_group.b.id
}
resource "aws_security_group" "b" {
name = "b"
}
resource "aws_security_group_rule" "b_from_a" {
security_group_id = aws_security_group.b.id
type = "ingress"
from_port = 22
to_port = 22
protocol = "tcp"
source_security_group_id = aws_security_group.a.id
}
The Problem
By splitting individual rules out into their own aws_security_group_rule
resource, we lose the ability to remove any rules applied outside of the Terraform configuration.
Design Decisions
Again, snipped from @bflad’s response.
Given that background, we can hopefully lay out some of the design decisions we need to consider:
- Terraform and its provider ecosystem have been generally designed with the goal of usable and reliable infrastructure provisioning being top priority. Drift detection and a subset of this problem being exclusive management of child resources is a secondary priority to the first. The pragmatic decision to previously introduce a separate
aws_security_group_rule
resource satisfies the “usable” goal in this situation. - There is no real precedent for how to handle this particular situation in the Terraform ecosystem, given the equally frustrating combination of atypical cyclic references and increased desire for drift detection of this particular configuration.
- The confusing behavior when attempting to manage child components between multiple of these parent-child resources is a constant source of bug reports and practitioner confusion. Even with documentation warnings, it is not a good user experience that the provider developers here have much control over.
- Adding a second parent resource to the mix here, while not existing anywhere else in the Terraform ecosystem (that we are familiar with), could further increase this practitioner confusion. In particular, this new resource would not provide warnings/errors if attempting to use multiple of this resource to manage the same EC2 Security Group even ignoring the original cycle problem:
# This example would introduce perpetual differences
# without Terraform providing any user interface warnings.
# Practitioners would be required to do one of the following to learn its not supported:
# * (Re-)Read resource documentation
# * Ask colleagues or in a forum
# * Report a GitHub issue
resource "aws_security_group" "a" {
name = "a"
}
resource "aws_security_group_rules" "a-ingress-ssh" {
security_group_id = aws_security_group.a.id
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["10.0.0.0/8"]
}
# ... potentially others ...
}
# Potentially in another Terraform configuration, managed by some other team
resource "aws_security_group_rules" "a-ingress-https" {
security_group_id = aws_security_group.a.id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# ... potentially others ...
}
# This example would introduce perpetual differences
# without Terraform providing any user interface warnings.
# The ingress/egress attributes do not have Computed: true
resource "aws_security_group" "a" {
name = "a"
}
resource "aws_security_group_rules" "a-ingress" {
security_group_id = aws_security_group.a.id
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["10.0.0.0/8"]
}
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# ... potentially others ...
}
# Potentially in another Terraform configuration, managed by some other team
# aws_security_group_rules.a-ingress will remove egress
# aws_security_group_rules.a-egress will try to re-add
resource "aws_security_group_rules" "a-egress" {
security_group_id = aws_security_group.a.id
egress {
from_port = 0
to_port = 65536
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# ... potentially others ...
}
- Some additional questions may also arise: How do we tell the community about this new resource? Why is there a new resource? Which resource is correct or better? Do I have to migrate? Should I migrate? Can this be combined with existing resources? e.g.
# This example would introduce perpetual differences
# without Terraform providing any user interface warnings.
# The ingress/egress attributes do not have Computed: true
resource "aws_security_group" "a" {
name = "a"
}
resource "aws_security_group_rules" "a-ingress" {
security_group_id = aws_security_group.a.id
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["10.0.0.0/8"]
}
# This example would introduce perpetual differences
# without Terraform providing any user interface warnings.
# The ingress/egress attributes do not have Computed: true
# aws_security_group_rules.a-ingress will always try to remove this rule, while this tries to add it
resource "aws_security_group" "a" {
name = "a"
egress {
from_port = 0
to_port = 65536
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group_rules" "a-ingress" {
security_group_id = aws_security_group.a.id
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["10.0.0.0/8"]
}
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# ... potentially others ...
}
- The existing
aws_security_group
resource has a large usage footprint. We would be very hesitant to make breaking changes to that resource, including the deprecation/removal of itsingress
andegress
attributes so there remains one canonical parent resource, unless there is no other option since it would be an equally large burden on the community to change configurations.
All these put us in a rough position with the current proposal, since there is additional burden somewhere. We would prefer to not have a single resource that operates differently than the majority of other resources. While the above configurations may seem obvious when the resources are declared next to each other, varying team structures lead to varying configuration layouts and ownership.
User Stories
I’ll break down the problems that I would love to see solved:
User Story #1
As a Terraform Practitioner
I want to holistically manage two security groups that reference each other on either their ingress or egress rules
So that any rules introduced outside of the Infrastructure as Code definition are removed upon execution
User Story #2
As a Governance, Risk Management, and Compliance Auditor
I want to know that Infrastructure as Code configurations contain and enforce the desired state of security group definitions
So that compliance is maintained