I want to first run the crawler which creates the table then update the aws glue catalog table only one column schema and reran the crawler to sync the schema change to partition.
I was able to achieve this manually using aws console. but having trouble doing in Terraform.
In Terraform I am able to create crawlers, currently they are on demand so after running that it create the tables.
below is the code of creating crawlers:
resource "aws_glue_crawler" "my_crawler" {
database_name = "test_db"
name = "test_crawler"
role = "test_role"
table_prefix = "test_"
dynamic "s3_targets" {
for_each = [
{
path = "s3://some_bucket/path/to/table/"
exclusions = ["**/temp/**"]
},
{
path = "s3://some_bucket/path/to/table/"
exclusions = ["**/temp/**"]
}
]
}
schema_change_policy {
delete_behaviou = "DELETE_FROM_DATABASE"
}
configuration = jsonencode({
"Version": 1.0,
"Grouping": {
"TableLevelConfiguration": 5,
"TableGroupingPolicy": "CombineCompatibleSchemas"
},
"CrawlerOutput": {
"Partitions": {
"AddOrUpdateBehavior": "InheritFromTable"
}
}
})
}
Then I tried updating the catalog table using “aws_glue_catalog_table”
below is the code:
resource "aws_glue_catalog_table" "my_table" {
database_name = "test_db"
name = "test_table"
storage_descriptor {
columns {
name = "info"
type = "struct<somethingA:struct<infoA:string,infoB:string>>"
comment = "infoA and infoB"
}
}
}
but after doing Terraform apply
it gave me an error saying AlreadyExistsException: Table already exists
then I did Terraform import
command
Terraform import aws_glue_catalog_table.my_table 123465789:test_db:test_demo1
here 123456789
is catalog_id
then again did Terraform apply
it ran and updated the table but deleted all the partitions and other information which it needs.
is it possible to update catalog table column schema without destroying anything else in Terraform?
if so then how?
Note 1: I also tried using data block to get table infos and pass it to resource block but was getting error while accessing items in storage_descriptor
also for assigning partition_keys
data having this error
error:
an argument named "partition_keys" is not expected here. Did you mean to define a block of type "partition_keys"?
Note 2: plan and apply output is generated in remote virtual machine so not able to copy and paste them here.