I have two simple modules, basically containing aws_instance (Database, and Application) the application requires the db address, and the database requires the app security group. My root .tf file looks like this:
module "db" {
source = "./modules/db"
db_count = 1
db_ami = "ami-xxxxxxx"
db_instance_type = "t2.micro"
..
db_security_group = module.app.security_group_id
..
}
module "te-app" {
source = "./modules/te_app"
te-ami = "ami-xxxxxxx"
te-app_instance_type = "t2.micro"
..
app_mongo_address = module.db.private_ip
..
}
The application has a web server that when instantiated queries the database. The web server queries the database and then crashes because it cannot connect.
I have tried using the depends_on = [module.db.private_ip] but the application before the db server.
Every once and a while it works and the web server connects, but more often than not it does not.
the explicit dependency does not seem to help, any ideas what I could be doing incorrect?
You have a circular dependency, and you need to break it.
My suggestion would be to move the security group resource out of the app module. Either handle it directly in the main part of the configuration, or move it into the networking module - assuming you have one like that for handling vpc/subnet/etc. The latter option would be prettier, I think.
This feels to me like an issue arising from there being a delay between the API signalling that an object exists and that object actually being ready to use. In many cases there isn’t a way to fix that completely at the infrastructure provisioning layer: these things are distributed systems that take some time to initialize fully and often don’t report detailed process out to the world around them.
My usual advice in situations like this is to make the application code itself defensive about its dependencies being unavailable. For example, perhaps if the application is unable to reach the database then it would retry periodically but refuse to accept incoming requests until it’s successfully (re-)connected. That not only decouples your startup sequence and thus mitigates these ordering problems a little, but also buys you a little more resilience to runtime issues like network partitions that might make some of your application servers become unable to reach the database even after they have been running successfully for a while.
Modules themselves don’t participate in the dependency graph – only their individual variables and outputs do. So this sort of situation where two modules appear to depend on each other is actually okay in Terraform, as long as the other objects on either side of the module boundary don’t form a circular dependency situation themselves. However, dependencies can only control the ordering of the operations Terraform directly controls; if the vendor API that the provider is talking to says that the object is “ready” then Terraform will move on to the next object.
Thanks for the response, normally I would have moved the security group out of the app module but the exercise that was given wanted a particular solution and touching the initial module API is a no no…
This is definitely a signalling issue, the KISS way for me was to introduce a simple delay on bringing up the app server. I was looking for some awesome engineering feat but sixteen characters seemed enough for this particular challenge.
Thank you for your response…