Adding partition_index block to aws_glue_catalog_table resource destroys the table instead of updating in place

srperetz · March 11, 2022, 12:09pm

If you add a partition_index block to an existing aws_glue_catalog_table resource, rather than detecting that this can be an update in place and calling the CreatePartitionIndex API, the terraform provider instead destroys the table and recreates it. It does this even if the table is not empty, resulting in data loss.

Can the provider be updated to avoid this bad scenario? Even though the “correct” solution would be to instead add an aws_glue_partition_index resource, this is not necessarily obvious – the aws_glue_catalog_table don’t mention that you should do this rather than adding a partition_index directly to the aws_glue_catalog_table. Instead of happily going ahead and destroying a non-empty table, perhaps the provider could check if it’s empty and fail the apply if it’s not (and suggest the use of the aws_glue_partition_index resource as an alternative for a non-empty table). This would be much better behavior than destroying data.

...
07:22:35  - Installed hashicorp/aws v4.4.0 (signed by HashiCorp)
...
07:23:01    # aws_glue_catalog_table.cust_date_small must be replaced
07:23:01  -/+ resource "aws_glue_catalog_table" "cust_date_small" {
...
07:23:01        + partition_index { # forces replacement
07:23:01            + index_name   = "customer_id_date_index"
07:23:01            + index_status = (known after apply)
07:23:01            + keys         = [
07:23:01                + "customer_id",
07:23:01                + "date",
07:23:01              ]
07:23:01          }
...

Full trace log is attached.
aws_glue_catalog_table-log.txt (85.8 KB)

anGie44 · March 29, 2022, 2:52pm

Hi @srperetz , thank you for raising this topic. The aws_glue_catalog_table 's destroy/create behavior is by design as the Glue API 's Table related methods don’t support in-place changes to that parameter (e.g. Table API - AWS Glue), so for the moment, in cases like these, users should adopt the independent resource as suggested. Nevertheless, we can better document this as it’s not readily apparent the consequences of configuration changes and what alternative resource user’s have available to them. I’ll look to see if the Delete table operation has an option to prevent deletion if non-empty but from my initial reading (Table API - AWS Glue), there doesn’t seem to be a parameter we could readily use to prevent destruction without making extra API calls.

srperetz · March 29, 2022, 11:10pm

Thanks anGie44. You are probably correct that the delete API wouldn’t have this option, but I was imagining that you should be able to implement your own check by calling the GetPartitions API with MaxResults=1 before calling DeleteTable, and then fail the apply operation if that returns a non-empty result set.

SImilarly, while you are correct that Glue does not allow you to update the partition indexes in place, I was suggesting in my post that once the provider implementation has determined that the table exists, and also that the only change is the addition of a partition index, it could (in theory) treat this exactly as if an aws_glue_partition_index resource had been added, updating the terraform state in the same way. If that ends up adding too much complexity, then I think doing the “fail if not empty” would be a reasonable plan B.

Topic		Replies	Views
How can I Update AWS Glue Catalog Table Column Schema AWS	0	2229	March 2, 2023
Configuring an AWS Glue Crawler Table Level AWS	2	4990	May 3, 2022
Problem with aws_glue_catalog_table when using delta tables AWS	0	494	March 21, 2023
Partition columns to AWS Athena Iceberg table Terraform Providers	0	566	December 20, 2023
Aws Lake Formation, how to disable IAM access on database level using terraform? AWS	0	675	June 9, 2022

Adding partition_index block to aws_glue_catalog_table resource destroys the table instead of updating in place

Related topics