Updating node pools force cluster/node pool destruction because of VPC Subnet

I’m trying to figure out if this is intended. I have a fairly simple deployment in which I create a VPC with associated subnet and ranges and then spin up a GKE instance with node pools tied to the previously created VPC.

It would seem any change I try to apply to the cluster or node pools results in the entire cluster needing to be destroyed and recreated because for some reason the subnet has to be destroyed and recreated. I can’t seem to find any explanation about this so it is quite possible I’ve done something wrong. I just don’t think changing the number of machines in a node pool merits cluster/subnet destruction.

this is a small portion of the terraform plan output:

# module.vpc.google_compute_subnetwork.subnet must be replaced
...
~ network = "https://www.googleapis.com/compute/v1/projects/<project-id>/global/networks/<network-name>" -> "projects/<id>/global/networks/<network-name>" # forces replacement

Every other resource shows the same network dependency line requiring destruction.
Here is sample code for the vpc and gke (they use vars to pass outputs)

resource "google_compute_network" "vpc" {
  name = format("%s-%s", var.instance_name, "vpc")
  auto_create_subnetworks = false
  delete_default_routes_on_create = false
  routing_mode = "REGIONAL"
  mtu = 1460
}

resource "google_compute_subnetwork" "subnet" {
  name = format("%s-%s", var.instance_name, "subnet")
  ip_cidr_range = var.primary_cidr
  region = var.region
  private_ip_google_access = true
  network = google_compute_network.vpc.id
  stack_type = "IPV4_ONLY"

  secondary_ip_range {
    ip_cidr_range = var.pod_cidr
    range_name    = local.pod_subnet
  }
  secondary_ip_range {
    ip_cidr_range = var.service_cidr
    range_name    = local.service_subnet
  }
}
resource "google_container_cluster" "gke" {
  name     = format("%s-%s", var.instance_name, "gke-cluster")
  location = var.region

  min_master_version = "1.24"
  release_channel {
    channel = "RAPID"
  }

  initial_node_count       = 1
  remove_default_node_pool = true

  networking_mode = "VPC_NATIVE"
  network         = var.vpc_self_link
  subnetwork      = var.subnet_self_link
  ip_allocation_policy {
    cluster_secondary_range_name  = var.pod_subnet
    services_secondary_range_name = var.service_subnet
  }

  private_cluster_config {
    enable_private_nodes = true
    enable_private_endpoint = false
    master_ipv4_cidr_block = "172.16.0.0/28"
  }
}

resource "google_container_node_pool" "test_node_pool" {
  name    = format("%s-%s", var.instance_name, "test-node-pool")
  cluster = google_container_cluster.gke.id
  location = var.region
  max_pods_per_node = 110

  node_count = 1
  node_locations = ["us-west1-c","us-west1-a"]
  node_config {
    disk_type = "pd-ssd"
    disk_size_gb = 100
    machine_type = "n2d-standard-2"
  }
}

I don’t work with Google cloud, so I’m not sure, but something which stands out to me is that in this:

(line wrapping / whitespace manipulated to make a point)

the change seems to be the removal of a https://www.googleapis.com/compute/v1/ prefix.

This makes me wonder if there’s a bug in terraform-provider-google, such that it’s representing the thing as a full URL in some places, but just a specific URL-path in others, and then erroneously deciding there’s a different between the two representations.

Good call, I had redacted the actual information but in the first half of that line it uses the actual project name, in the second half of the line it uses the id associated with the name. It might be the missing prefix but it also is using different strings to represent the same thing which might be part of what causes the replacement. It could be a bug, I will read up more on that and follow up

So it turns out I’m an idiot. In the google provider I had used the Project Number and not the Project Id. Either I didn’t realize I had done it or figured they were interchangeable. Either way, it didn’t throw an error which has me believing that my gcloud config was using a default project so possibly was ignoring what was misconfigured.

Either way, after I had put in the correct project id it no longer operates in a way that seems buggy and does what I would expect it to. Grateful for the response that directed me back to the network change to notice the id being wrong.