How can I Update AWS Glue Catalog Table Column Schema

skj-skj · March 2, 2023, 2:26am

I want to first run the crawler which creates the table then update the aws glue catalog table only one column schema and reran the crawler to sync the schema change to partition.
I was able to achieve this manually using aws console. but having trouble doing in Terraform.

In Terraform I am able to create crawlers, currently they are on demand so after running that it create the tables.

below is the code of creating crawlers:

resource "aws_glue_crawler" "my_crawler" {

	database_name = "test_db"
	name = "test_crawler"
	role = "test_role"
	table_prefix = "test_"
	dynamic "s3_targets" {
		for_each = [
			{
				path = "s3://some_bucket/path/to/table/"
				exclusions = ["**/temp/**"]
			},
			{
				path = "s3://some_bucket/path/to/table/"
				exclusions = ["**/temp/**"]
			}
		]
	}
	
	schema_change_policy {
		delete_behaviou = "DELETE_FROM_DATABASE"
	}
	
	configuration = jsonencode({
		"Version": 1.0,
		"Grouping": {
			"TableLevelConfiguration": 5,
			"TableGroupingPolicy": "CombineCompatibleSchemas"
		},
		"CrawlerOutput": {
			"Partitions": {
				"AddOrUpdateBehavior": "InheritFromTable"
			}
		}
	})
}

Then I tried updating the catalog table using “aws_glue_catalog_table”

below is the code:

resource "aws_glue_catalog_table" "my_table" {

	database_name = "test_db"
	name = "test_table"
	
	storage_descriptor {
		columns {
			name = "info"
			type = "struct<somethingA:struct<infoA:string,infoB:string>>"
			comment = "infoA and infoB"
		}
	}
	
}

but after doing Terraform apply it gave me an error saying AlreadyExistsException: Table already exists

then I did Terraform import command

Terraform import aws_glue_catalog_table.my_table 123465789:test_db:test_demo1

here 123456789 is catalog_id

then again did Terraform apply it ran and updated the table but deleted all the partitions and other information which it needs.

is it possible to update catalog table column schema without destroying anything else in Terraform?
if so then how?

Note 1: I also tried using data block to get table infos and pass it to resource block but was getting error while accessing items in storage_descriptor also for assigning partition_keys data having this error

error:
an argument named "partition_keys" is not expected here. Did you mean to define a block of type "partition_keys"?

Note 2: plan and apply output is generated in remote virtual machine so not able to copy and paste them here.

Topic		Replies	Views
Configuring an AWS Glue Crawler Table Level AWS	2	4460	May 3, 2022
Adding partition_index block to aws_glue_catalog_table resource destroys the table instead of updating in place AWS	2	1123	March 29, 2022
Aws_glue_schema: how to update a schema? AWS	0	818	April 28, 2022
Problem with aws_glue_catalog_table when using delta tables AWS	0	403	March 21, 2023
Aws_glue_script - how to output to S3? Terraform	0	1476	March 13, 2020

How can I Update AWS Glue Catalog Table Column Schema

Related topics