I am familiar with how to explicitly create dependencies between different modules and resources, but have not been able to think of a way of doing so in this scenario, where it is the same module using the same resource. I am using Terraform to manage and deploy some resources to my Snowflake account. I am currently adding in support for Tasks, and this is where I hitting a wall. The snowflake_task resource in the Snowflake terraform provider allows me to specify the “after” parameter. When specified, it will create the Task in Snowflake and map it to its “parent” task–basic DAG generation. The problem I am running into is in my current set up, every time I try creative ways of telling Terraform that TEST_TASK_TWO is dependent on TEST_TASK, I get a cycle error. Snowflake basically requires that the Tasks get created in the order in which the occur in the DAG, so if my DAG is: Step 1 > Task 1 > Step 2 > Task 2 > Step 3 > Task 3 | Task 4, Terraform needs to ensure that Task 1 is created first, then Task 2, and then Tasks 3 and 4 can be created at the same time, assuming Task 2 has been created.
I know I can solve this problem if I wasn’t using modules, if I just had a resource per task, and set the dependencies between each resource appropriately. I am just trying to keep the development experience consistent for the users of this codebase, and doing so would be a different pattern than how we have established everywhere else.
My hope is that I am overthinking this and not seeing the solution…any help or guidance would be appreciated. Thanks! Detailed setup below.
metadata .yaml file:
task:
- name: TEST_TASK_TWO
comment: ""
sql_statement: SELECT 1;
database: DLH_CORE
schema: DBO
warehouse: DLH_PIPELINE
run_after_task_names:
- DLH_CORE.DBO.TEST_TASK
environment:
qas:
task_enabled: false
prd:
schedule: ""
task_enabled: false
- name: TEST_TASK
comment: ""
sql_statement: SELECT 1;
database: DLH_CORE
schema: DBO
warehouse: DLH_PIPELINE
environment:
qas:
task_enabled: false
schedule: USING CRON 0 12 * * 2 UTC
prd:
schedule: ""
task_enabled: false
task.tf file:
locals {
# list all conf files, parse all master files
master-task-files = [
for files in fileset(path.cwd, "../projects/*/tasks.yaml") : "${path.module}/${files}"
]
# read the json contents
task-config = [
for f in local.master-task-files : yamldecode(file("${f}"))
]
# level 1 flatten, combine all database configs into one list
flatten_task_config = flatten(
[
for task in local.task-config : task.task
]
)
# level 0 flatten, combine all database configs into one map
task_map = { for k, v in local.flatten_task_config : upper("${v.database}.${v.schema}.${v.name}") =>
{
name = v.name
comment = try(v.comment, null)
warehouse = try(v["environment"]["${var.ENV}"]["warehouse"], v.warehouse)
database = try(v["environment"]["${var.ENV}"]["database"], v.database)
schema = try(v["environment"]["${var.ENV}"]["schema"], v.schema)
sql_statement = try(v["environment"]["${var.ENV}"]["sql_statement"], v.sql_statement)
schedule = try(v["environment"]["${var.ENV}"]["schedule"], null)
enabled = try(v["environment"]["${var.ENV}"]["task_enabled"], v.enabled)
after = try(v["run_after_task_names"], null)
}
}
}
module "tasks" {
source = "C:\\REPO\\terraform.blueprint.snowflake\\terraform\\Task"
task_map = local.task_map
}
…which sources the basic resource:
resource "snowflake_task" "task" {
for_each = var.task_map
comment = each.value.comment
database = upper(each.value.database)
schema = upper(each.value.schema)
name = upper(each.value.name)
sql_statement = each.value.sql_statement
warehouse = try(upper(each.value.warehouse), null)
schedule = try(each.value.schedule, null)
enabled = try(each.value.enabled, false)
after = try([for i in each.value.after: upper(i)], null)
error_integration = try(each.value.error_integration, null)
user_task_timeout_ms = try(each.value.user_task_timeout_ms, 3600000)
when = try(each.value.when, null)
session_parameters = merge(try(each.value.session_parameters, {}), {})
}