Databricks Unity Catalog account vs workspace level understanding

I’m trying to create a Unity Catalog metastore and am getting an error when calling the databricks_metastore resource.

The documentation states that a metastore is a top-level container that can be shared across workspaces, which means it lives at the account level. This is further reinforced by the fact that creating a new metastore in the Databricks UI takes place in the account UI, not in the workspace UI. So the following error is actually quite confusing because it is telling me that the provider for calling databricks_metastore should have a host parameter which is only the case for a workspace level provider. How does that make any sense?

Error: cannot create metastore: Databricks API (/api/2.1/unity-catalog/metastores) requires you to set `host` property (or DATABRICKS_HOST env variable) to result of `databricks_mws_workspaces.this.workspace_url`. This error may happen if you're using provider in both normal and multiworkspace mode. Please refactor your code into different modules. Runnable example that we use for integration testing can be found in this repository at https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/aws-workspace

on [unity-catalog.tf](http://unity-catalog.tf/) line 93, in resource "databricks_metastore" "unity_catalog_metastore":

93: resource "databricks_metastore" "unity_catalog_metastore" {

What is even stranger is that the Databrick Terraform documentation seems to support the fact that calling the databricks_metastore resource should use a workspace level provider, as seen here but again how does that make any sense when a metastore is created at the account level? Which host am I supposed to use if I have multiple workspaces? At this point, I was pretty confused so I when into the Databricks Account UI and created a metastore and inspected the network tab only to find a POST api/2.0/accounts/{account_id}/metastores to create the metastore. That API call makes perfect sense since a metastore lives on the account level that it would call the accounts resource. There might be something I’m missing in the provider code found here but I don’t see how the databricks_metastore resource is operating on the account level?

There are probably somethings I don’t understand between the 2.0 and 2.1 APIs for unity catalog, so forgive me if it’s really just as simple as that. I will say, my team has followed the E2 workspace provisioning documentation here and created a Terraform module that calls all the databricks_mws_* resources. This module is already used in two separate Terraform projects which represent our workspaces/environments: prod and stage. Given that, if I were to put the shared Unity Catalog resources such as the s3 bucket, and IAM role in the module that creates a workspace I would end up with duplicates or module clashes right?

Anyway, I would greatly appreciate some guidance on this matter because to me it feels like the resource creation doesn’t follow the logical hierarchy stated in the documentation nor what is reflected in the Databricks account and workspace user interfaces. Thanks!

3 Likes

So there are a few things to expand here:

  • Unity Catalog API - these APIs are currently exposed via a workspace endpoint per documentation, not the account endpoint. This is why the Terraform provider requires a workspace endpoint to create a metastore. Any workspace under the same E2 account would suffice as the host.
  • Unity Catalog UI - this uses account-level APIs under the hood, but these are not yet publicly available. Once the engineering team made these available (ETA unknown), the Terraform provider will allow that option as well
  • Unity Catalog API 2.0/2.1 - this change was announced a few months back in a support email from Databricks. This is to prepare for some breaking changes to be introduced later.
  • Module structure with Unity Catalog - You can still create a separate module for Unity Catalog resources, the provider has to be configured with the workspace URL, which has to come from your databricks_mws module. The metastore assignment can be done for multiple workspaces, by supplying a list of workspace_id

We are reaching out to notify you that the Unity Catalog API will be switching from v2.0 to v2.1 as of Aug 11, 2022, after which v2.0 will no longer be supported.

1 Like