Attempting to create Vertex AI Managed Notebook fails with error: Failed to insert a GCE VM

I’m trying to create a number of vertex AI managed notebooks in GCP (note, not user managed notebooks, but managed notebooks). Each one fails with the same error:

2023-07-10T02:36:05.3813609Z e[31m│e[0m e[0me[1me[31mError: e[0me[0me[1mError waiting to create Runtime: Error waiting for Creating Runtime: Error code 3, message: operation “projects/362764600745/locations/australia-southeast1/operations/create-d73db12f-8b7e-48eb-af84-7aadae580bd1” completed with error: %!w(*status.Status=&{{{} } 3 Http(400) Bad Request; [ExecuteComputeApi] failed.; [VM][:vm-d73db12f-8b7e-48eb-af84-7aadae580bd1] ; Failed to insert a GCE VM 131})e[0m

2023-07-10T02:36:05.3814103Z e[31m│e[0m e[0m

2023-07-10T02:36:05.3814380Z e[31m│e[0m e[0me[0m with module.user2notebook.google_notebooks_runtime.runtime,

2023-07-10T02:36:05.3814746Z e[31m│e[0m e[0m on notebook_module/notebook_module.tf line 16, in resource “google_notebooks_runtime” “runtime”:

2023-07-10T02:36:05.3815095Z e[31m│e[0m e[0m 16: resource “google_notebooks_runtime” “runtime” e[4m{e[0me[0m

2023-07-10T02:36:05.3815319Z e[31m│e[0m e[0m

-I have ruled out permissions, as we have elevated the service account being used to manage the deployment all the way up to OWNER of the project.

  • All the required APIs are running in the GCP project, as we can manually create a managed notebook via the console UI in Vertex AI without issue (same settings).

  • I have ruled out issues with the network and subnet being shared with this project from the shared VPC project (again, because the manual creation works fine).

Google Support have said it’s a Terraform issue, so here I am unfortunately.

Here is the module I created to stand up each notebook:

variable “user_name” {
description = “Name of the user”
type = string
}

variable “user_email” {
description = “Email of the user”
type = string
}

variable “machine_type” {
description = “Machine type for the notebook instance”
type = string
}

resource “google_notebooks_runtime” “runtime” {
name = “${lower(replace(replace(var.user_email, “@”, “-”), “.”, “-”))}-notebook-instance”
location = “australia-southeast1”
access_config {
access_type = “SINGLE_USER”
runtime_owner = var.user_email
}
virtual_machine {
virtual_machine_config {
machine_type = var.machine_type
internal_ip_only = true
network = var.NETWORK
subnet = var.SUBNET
data_disk {
initialize_params {
disk_size_gb = “100”
disk_type = “PD_STANDARD”
}
}

  metadata = {
    app    = "inventory-mlops"
    bu     = var.BU
    owner  = var.OWNER
    costcentre = var.COSTCENTRE
    email  = var.user_email
  }
  labels = {
    app    = "inventory-mlops"
    bu     = var.BU
    owner  = var.OWNER
    costcentre = var.COSTCENTRE
    email  = var.user_email
  }
}

}
timeouts {
create = “30m”
delete = “30m”
}
}

It seems to me to be set up correctly according to this reference (the only reliable one I can find online for Vertex AI managed notebooks)

Welcome to the forum - please reformat your message

I don’t work with GCP or notebooks, so I’m just responding with some suggestions based on general Terraform knowledge…

It seems terraform-provider-google has sent the request to create the thing to Google, had it accepted, and then polled waited for the operation to complete - and it has completed with an error.

It seems likely that one of two things is happening:

  • EITHER the Google API is returning a vague unhelpful error
  • OR the Google API is returning a good error, and Terraform isn’t surfacing the entire response

It might be worth trying the environment variable TF_LOG_PROVIDER=debug (or trace for even more verbosity) to see whether it logs raw responses from Google. (I do not know whether this provider does that or not.)

Also, since you’ve already got Google Support speaking to you, you might ask them to clarify: “What parameters in my request are invalid, that make you think it is a Terraform issue?”

@kknd4eva Were you able to get a resolution from GCP or hashicorp for this?

Not an immediate one, but we have enterprise support from Google, and their engineers advised this was a bug/issue on their end. They said they’d be looking to patch it in December sometime. I’m still waiting on confirmation, but hopefully that means we’ll see a fix soon.