Terraform State Refresh Error with Azurerm Storage account with private endpoint

Hello everyone,

I am facing an issue with a Terraform deployment on Azure with configured Backend (terraform state). The deployment consist of few only resources, app service plan, functionapp with private endpoint and storage account. After the initial deployment the storage account has public access disabled and a private endpoint configured but with no private dns zone attached (this is something that is added manual). Same applies to the functionapp which has public access network disabled and a private endpoint. When I do a minor change lets say on the function app, the terraform plan fails during refreshing state on the storage account after 5 min and it throws time out. Here is the exact log:

Storage Account Name: “stfunccctestdev001”): executing request: Get “[https://stfunccctestdev001.blob.core.windows.net/?comp=properties&restype=service”](https://stfunccctestdev001.blob.core.windows.net/?
comp=properties&restype=service%22): context deadline exceeded

I assume it’s networking issue because somehow the agent that is running this terraform plan tries to reach out the properties of the storage account which is residing in a private azure network and no public access is enabled. I confirmed that by deploying a storage account with public access enabled and terraform refresh was not failing. But I am really not sure why it should access the storage account to refresh the state ? And why the agent that runs that should have access to it? His job is only to deploy this storage account and any other resources and just update or delete it. Lets say this agent can me a Microsoft hosted one (azure pipelines), why I should allow it specifically to reach this storage account? Isn’t there any other workaround ?

Last thing to mention is that I also tried with enabling the private dns zone to the storage account still not working. I tried also to enable the public access after the initial deployment but then the state fails again because it sees a change that has been made manually.

Any help ?

Hi @ceciivanov,

This is a common issue with the way the azurerm provider has been written and handles storage accounts.

The main storage account itself is provisioned and configured by the provider using the Azure Resource Manager APIs (The Azure ‘control plane’), however the sub resources, such as containers are created by the azurerm provider accessing the storage account directly (not via the ARM API) so therefore requires access via the ‘data plane’ as soon as you want to manipulate these sub resources.

This means that (if you are provisioning these sub resources) the device that you are running your terraform plan/apply on requires access to the storage account itself. Of course, this can be a problem when the security policy that needs to be applied to the storage account mandates that it must not be publicly accessible (e.g… use of Private endpoints or service endpoints only) or that any public access to the storage account must be controlled by the storage account firewall (IP whitelisting).

Further issues arise if you are using pipeline agents (e.g… Azure DevOps pipelines) to provision your infrastructure:

  • If you are using ‘Microsoft Hosted’ pipeline agents then they will need access to the storage account via the public network. Furthermore, the pipeline agents can run in any of the regions in the geography (e.g… If you have your devops org geo set to Europe then the agents can be instantiated in either the North Europe or West Europe region. That’s a lot of Azure IP addresses to whitelist on a storage account and makes ‘securing’ the Storage account a bit of a joke.

  • If you are using Self-Hosted pipeline agents then you need to ensure that the agent has access to the data plane. This could be as straightforward as IP whitelisting an IP address that all of your org’s traffic to the internet originates from. But more often that not, it means ensuring the agent has the ability to route through your private network to hit the storage accounts private endpoint or that it is in a subnet with with storage service endpoints provisioned (and you have the relevant subnet allowed in the storage firewall)

If you are not able to go down the self-hosted agent route, or for some reason have to run Terraform on a device that is outside your private network / can’t privately route there is an approach you can use to mitigate this and move the ‘subresource’ provisioning over to use the control-plane (removing the requirement to access the storage account directly by the agent:

Provision the storage account and all configuration using the AzureRM provider, but utilise the AzAPI provider (which ONLY uses the ARM API / data plane) to then provision and configure the sub resources similar to the below:

resource "azapi_resource" "containers" {

  type = "Microsoft.Storage/storageAccounts/blobServices/containers@2022-09-01"
  body = {
    properties = {
      publicAccess = false
    }
  }
  name                      = "my_container_name"
  parent_id                 = "${azurerm_storage_account.id}/blobServices/default"  # <-- Reference to the storage account deployed via AzureRM Providcer
  schema_validation_enabled = false 
    # https://github.com/Azure/terraform-provider-azapi/issues/497
}

For a more complete example see this from the Azure Verified module project’s storage account module.

Also you may need to refer to: Microsoft.Storage/storageAccounts/blobServices/containers - Bicep, ARM template & Terraform AzAPI reference | Microsoft Learn

Hope that helps

Happy Terraforming!

Note: in my code above I am using the ‘dynamic schema’ capability that is available from azapi provider release v1.13.0, whereas the AVM uses the older approach requiring a jsonencode() function for the body attribute

Hello, now I understand the issue, of course one possible solution is to run with self hosted agent which will be allowed but on the other hand If we don’t want to follow this approach, I don’t understand how exactly I must change my code because, I am not trying to specify any new sub resource like container I just deployed an empty storage account. Then which exactly azapi_resource I need to define each time ?

Would you be able to share (sanitised if relevant) the azurerm_storage_account block from your module? (use ``` before and after to format it as a code block in the post) - I will try and reproduce what you are seeing.

It does seem strange (although I guess not impossible) that the provider is trying to access the account via the data-plane which I usually only see related to creating elements such as containers. I do have a vague memory of seeing something similar related to the static website elements of storage accounts and private endpoints related to the AzureRM provider - hopefully trying to reproduce your issue will jog my memory (or prove my mind is playing tricks on me :slight_smile: )

Ok here is the code but I have defined the storage account using a custom module I’ve created so I will provide you the module and the code in the main.tf where I call it

main.tf:

module "storage" {
  count = var.create_storage_account ? 1 : 0

  source = "git::ssh://git@ssh.dev.azure.com/v3/XXXXXXX/XXXXXXX/terraform-azurerm-storage-account"

  location            = var.location
  resource_group_name = data.azurerm_resource_group.rg.name

  name = var.storage_account_name

  account_kind             = var.storage_account_kind
  account_tier             = var.storage_account_tier
  account_replication_type = var.storage_account_replication_type

  public_network_access_enabled   = false
  allow_nested_items_to_be_public = false

  private_endpoints = [
    {
      name             = "pe-${var.storage_account_name}-blob"
      subnet_id        = data.azurerm_subnet.snet_storage.id
      subresource_name = "blob"
      # private_dns_zone_id = data.azurerm_private_dns_zone.blob.id
    },
    {
      name             = "pe-${var.storage_account_name}-queue"
      subnet_id        = data.azurerm_subnet.snet_storage.id
      subresource_name = "queue"
      # private_dns_zone_id = data.azurerm_private_dns_zone.queue.id
    },
    {
      name             = "pe-${var.storage_account_name}-table"
      subnet_id        = data.azurerm_subnet.snet_storage.id
      subresource_name = "table"
      # private_dns_zone_id = data.azurerm_private_dns_zone.table.id
    },
    {
      name             = "pe-${var.storage_account_name}-file"
      subnet_id        = data.azurerm_subnet.snet_storage.id
      subresource_name = "file"
      # private_dns_zone_id = data.azurerm_private_dns_zone.file.id
    }
  ]
}

module:

resource "azurerm_storage_account" "this" {
  name                = var.name
  location            = coalesce(var.location, local.resource_group_location)
  resource_group_name = var.resource_group_name

  account_tier             = var.account_tier
  account_kind             = var.account_kind
  access_tier              = var.access_tier
  account_replication_type = var.account_replication_type
  tags                     = var.tags

  min_tls_version                  = var.min_tls_version
  enable_https_traffic_only        = var.enable_https_traffic_only
  public_network_access_enabled    = var.public_network_access_enabled
  allow_nested_items_to_be_public  = var.allow_nested_items_to_be_public
  edge_zone                        = var.edge_zone
  cross_tenant_replication_enabled = var.cross_tenant_replication_enabled
  default_to_oauth_authentication  = var.default_to_oauth_authentication
  is_hns_enabled                   = var.hns_enabled
  sftp_enabled                     = var.sftp_enabled
  nfsv3_enabled                    = var.nfsv3_enabled
  large_file_share_enabled         = var.large_file_share_enabled

  dynamic "network_rules" {
    for_each = var.network_rules != null ? [1] : []
    content {
      default_action             = var.network_rules.default_action
      bypass                     = toset(var.network_rules.bypass)
      ip_rules                   = toset(var.network_rules.ip_rules)
      virtual_network_subnet_ids = toset(var.network_rules.virtual_network_subnet_ids)
    }
  }
}

resource "azurerm_private_endpoint" "this" {
  count = length(var.private_endpoints)

  location            = var.location
  resource_group_name = var.resource_group_name

  name                          = var.private_endpoints[count.index].name
  subnet_id                     = var.private_endpoints[count.index].subnet_id
  tags                          = var.private_endpoints[count.index].tags
  custom_network_interface_name = var.private_endpoints[count.index].network_interface_name

  private_service_connection {
    is_manual_connection           = false
    name                           = "pse-${var.name}"
    subresource_names              = [var.private_endpoints[count.index].subresource_name]
    private_connection_resource_id = azurerm_storage_account.this.id
  }

  dynamic "private_dns_zone_group" {
    for_each = var.private_endpoints[count.index].private_dns_zone_id != null ? [1] : []
    content {
      name                 = "dns-zone-group-${var.name}"
      private_dns_zone_ids = [var.private_endpoints[count.index].private_dns_zone_id]
    }
  }

  lifecycle {
    ignore_changes = [tags]
  }
}

the idea is that in the module I have created I include many attributes of the storage account and all of those are expected to be passed as variables from the main.tf. Some are dynamic meaning that they are optional.

the code in the main.tf defines exactly the storage account with the attributes I want. This of course can change in future and I can add more attributes on it

Here is the main.tf where I define the module storage account which is calling a custom created module I have in other repository. I will provide the module defined also for you to understand which exactly attributes I am passing.

module "storage" {
  count = var.create_storage_account ? 1 : 0

  source = "git::ssh://git@ssh.dev.azure.com/v3/XXXXXXXXX/XXXXXXX/terraform-azurerm-storage-account"

  location            = var.location
  resource_group_name = data.azurerm_resource_group.rg.name

  name = var.storage_account_name

  account_kind             = var.storage_account_kind
  account_tier             = var.storage_account_tier
  account_replication_type = var.storage_account_replication_type

  public_network_access_enabled   = false
  allow_nested_items_to_be_public = false

  private_endpoints = [
    {
      name             = "pe-${var.storage_account_name}-blob"
      subnet_id        = data.azurerm_subnet.snet_storage.id
      subresource_name = "blob"
    },
    {
      name             = "pe-${var.storage_account_name}-queue"
      subnet_id        = data.azurerm_subnet.snet_storage.id
      subresource_name = "queue"
    },
    {
      name             = "pe-${var.storage_account_name}-table"
      subnet_id        = data.azurerm_subnet.snet_storage.id
      subresource_name = "table"
    },
    {
      name             = "pe-${var.storage_account_name}-file"
      subnet_id        = data.azurerm_subnet.snet_storage.id
      subresource_name = "file"
    }
  ]
}
resource "azurerm_storage_account" "this" {
  name                = var.name
  location            = coalesce(var.location, local.resource_group_location)
  resource_group_name = var.resource_group_name

  account_tier             = var.account_tier
  account_kind             = var.account_kind
  access_tier              = var.access_tier
  account_replication_type = var.account_replication_type
  tags                     = var.tags

  min_tls_version                  = var.min_tls_version
  enable_https_traffic_only        = var.enable_https_traffic_only
  public_network_access_enabled    = var.public_network_access_enabled
  allow_nested_items_to_be_public  = var.allow_nested_items_to_be_public
  edge_zone                        = var.edge_zone
  cross_tenant_replication_enabled = var.cross_tenant_replication_enabled
  default_to_oauth_authentication  = var.default_to_oauth_authentication
  is_hns_enabled                   = var.hns_enabled
  sftp_enabled                     = var.sftp_enabled
  nfsv3_enabled                    = var.nfsv3_enabled
  large_file_share_enabled         = var.large_file_share_enabled

  dynamic "network_rules" {
    for_each = var.network_rules != null ? [1] : []
    content {
      default_action             = var.network_rules.default_action
      bypass                     = toset(var.network_rules.bypass)
      ip_rules                   = toset(var.network_rules.ip_rules)
      virtual_network_subnet_ids = toset(var.network_rules.virtual_network_subnet_ids)
    }
  }
}

resource "azurerm_private_endpoint" "this" {
  count = length(var.private_endpoints)

  location            = var.location
  resource_group_name = var.resource_group_name

  name                          = var.private_endpoints[count.index].name
  subnet_id                     = var.private_endpoints[count.index].subnet_id
  tags                          = var.private_endpoints[count.index].tags
  custom_network_interface_name = var.private_endpoints[count.index].network_interface_name

  private_service_connection {
    is_manual_connection           = false
    name                           = "pse-${var.name}"
    subresource_names              = [var.private_endpoints[count.index].subresource_name]
    private_connection_resource_id = azurerm_storage_account.this.id
  }

  dynamic "private_dns_zone_group" {
    for_each = var.private_endpoints[count.index].private_dns_zone_id != null ? [1] : []
    content {
      name                 = "dns-zone-group-${var.name}"
      private_dns_zone_ids = [var.private_endpoints[count.index].private_dns_zone_id]
    }
  }

  lifecycle {
    ignore_changes = [tags]
  }
}

here is the code. actually I have a custom defined storage account module which has option for many attributes based on what you are passing when you call it in the main.tf as following. Currently this storage account does not define any sub resources. Only public network disabledd and also I am creating 4 private endpoints. If needed I will provide also the module of the storage account

module "storage" {
  count = var.create_storage_account ? 1 : 0

  source = "-----------"

  location            = var.location
  resource_group_name = data.azurerm_resource_group.rg.name

  name = var.storage_account_name

  account_kind             = var.storage_account_kind
  account_tier             = var.storage_account_tier
  account_replication_type = var.storage_account_replication_type

  public_network_access_enabled   = false
  allow_nested_items_to_be_public = false

  private_endpoints = [
    {
      name             = "pe-${var.storage_account_name}-blob"
      subnet_id        = data.azurerm_subnet.snet_storage.id
      subresource_name = "blob"
    },
    {
      name             = "pe-${var.storage_account_name}-queue"
      subnet_id        = data.azurerm_subnet.snet_storage.id
      subresource_name = "queue"
    },
    {
      name             = "pe-${var.storage_account_name}-table"
      subnet_id        = data.azurerm_subnet.snet_storage.id
      subresource_name = "table"
    },
    {
      name             = "pe-${var.storage_account_name}-file"
      subnet_id        = data.azurerm_subnet.snet_storage.id
      subresource_name = "file"
    }
  ]
}

Hi @ceciivanov ,

So - to prove out what I was saying with regard to the azurerm related elements above (eg. that you should not see issues if all you are configuring is the storage account) I used the following code:

resource "random_pet" "name" {
    separator = "" # no separator to account for storage account name restrictions
}

resource "azurerm_resource_group" "rg" {
  name     = "rg-${random_pet.name.id}"
  location = "north europe"
}

resource "azurerm_virtual_network" "vnet" {
  name                = "vnet-${random_pet.name.id}"
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
  address_space       = ["10.0.0.0/16"]
    subnet {
        name           = "private_endpoints"
        address_prefix = "10.0.1.0/24"
    }
}

resource "azurerm_storage_account" "sa" {
  name                     = "sa${random_pet.name.id}"
  resource_group_name      = azurerm_resource_group.rg.name
  location                 = azurerm_resource_group.rg.location
  account_tier             = "Standard"
  account_replication_type = "LRS"
  public_network_access_enabled = false
  allow_nested_items_to_be_public = false
  # restrict access to only from private endpoints
    network_rules {
        default_action             = "Deny"
        bypass                     = ["None"]
    }
}

resource "azurerm_private_endpoint" "pe" {
    name                = "pe-${random_pet.name.id}"
    location            = azurerm_resource_group.rg.location
    resource_group_name = azurerm_resource_group.rg.name
    subnet_id           = tolist(azurerm_virtual_network.vnet.subnet)[0].id

    private_service_connection {
        name                           = "psc-${random_pet.name.id}"
        is_manual_connection           = false
        private_connection_resource_id = azurerm_storage_account.sa.id
        subresource_names               = ["blob"]
    }
}

As you can see, this blocks all access except via the public endpoint. Running a plan after the initial deployment is successful. If I add an azurerm_storage_container resource then this fails with the following error, rather than timing out.

403 This request is not authorized to perform this operation.

Now, I am not discounting that the storage module may be doing something in addition to my basic code, above, which may be triggering the issue. So it would be good to see that if you can share (Feel free to DM me if you don’t want to share publicly).

However, I think what you may be seeing is related to this little “gotcha”: Storage access constraints for clients in VNets with private endpoints

Where are you running your Terraform plan/apply (MS Hosted pipeline agent / Self-Hosted / PC on corporate network, etc.), when you see the timeout?

If you do a dns lookup from the machine where you are getting the timeout:
nslookup <name>.blob.core.windows.net.
Do you get a public ip address directly or a privatelink CNAME response?

  1. Where can I dm you ?
  2. I am running the pipeline from self hosted agent on a virtual network in my tenant but is not corporate network. however this in the future will must work in a corporate network where the private dns zone are set manually and custom. So the thing is that even if that is the issue I don’t know how I would bypass this because the deployment will include only the storage account and not the private dns zone as this is set afterwards.
  3. Even that, because I tried the deployment on my tenant and I included a private dns zone as exactly you did I still get the timeout error.

Hi @ceciivanov,

Within this thread you should be able to click on my name on this reply and message me directly using the ‘message’ button

Logging in to the self-hosted agent are you able to run the nslookup test, I mentioned previously, to see what the name resolution path is.

Another test to consider carrying out is to see if you can access the storage account from the self-hosted agent.

Both of these will be able to direct us to more of a terraform/provider issue or a more fundamental DNS/Networking issue.

I still think the issue here is likely related to DNS resolution and the way Azure manipulates things once you have a private endpoint associated to it (unrelated to the subscription it is in), when you already have a private endpoints or private DNS associated with the Virtual network you have your client in.

  • Has the VNET in which you have your self-hosted pipeline agent already got any private endpoints and private DNS configured for (any) storage accounts?

  • Is the private endpoint you are deploying going into the same VNET as the self-hosted pipeline agent, or a separate VNET?

So the setup is the following:

I have self hosted agent in a separate network in azure. There are residing other storage accounts and private resources with private dns zones implemented and they are working fine for example the storage account for the terraform state files.

But the actual deployment of the application which contains the storage account is supposed to live in a totally different network. I do not understand why terraform state would need this access, for example I am also deploying a functionapp with private endpoint (and still no dns configured) but the terraform state successfully is refreshed. Why for storage account it should be able to resolve it and have probably network access ?

Further investigation and also mocking-up your scenario such that I get errors from Terraform in line with your experience (just using a basic azurerm_storage_Account resource):

azurerm_storage_account.sa: Refreshing state... [id=/subscriptions/redacted/resourceGroups/rg-storagetest/providers/Microsoft.Storage/storageAccounts/saprimaryterrapin]

Planning failed. Terraform encountered an error while generating this plan.

╷
│ Error: retrieving static website properties for Storage Account (Subscription: "redacted"
│ Resource Group Name: "rg-storagetest"
│ Storage Account Name: "saprimaryterrapin"): executing request: Get "https://saprimaryterrapin.blob.core.windows.net/?comp=properties&restype=service": context deadline exceeded
│
│   with azurerm_storage_account.sa,
│   on main.tf line 5, in resource "azurerm_storage_account" "sa":
│    5: resource "azurerm_storage_account" "sa" {
│

Take a look at the following logged issue and the many linked to/from it:
Terraform wants to reach storage account static website endpoint even when none will be created · Issue #20257 · hashicorp/terraform-provider-azurerm (github.com)

Also your current networking scenario sounds exactly like this scenario I highlighted in an earlier reply: Storage access constraints for clients in VNets with private endpoints so even if you had public networking enabled, the moment you deploy a private endpoint for a given storage account that is not in or routable from the VNET your client is in then you will be impacted.

It seems that, in order to be able to manage storage accounts deployed via the AzureRM provider, when private endpoints are involved, you must ensure your terraform runner is able to both resolve the private endpoint and route to it after the first apply.

The way the AzureRM provider interacts with most resources (such as function apps) other than storage accounts is entirely via the ARM API. Whereas the AzureRM provider seems to require access via the data plane for a lot of its functionality (As explored in the issue and linked issues)

Sorry I can’t provide any sort of work-around, but it seems the above is at least the answer as to why it is occurring for you and how it can be resolved (by client access to resolve and route to the private endpoint)