Role assignments for AKS node resource group get re-created

I need to assign different roles on the Node resource group created automatically by AKS.
I currently do it this way:

data "azurerm_resource_group" "rg_workers" {
  name = azurerm_kubernetes_cluster.aks.node_resource_group

resource "azurerm_role_assignment" "cilium_operator_role" {
  scope                            =
  role_definition_name             = "cilium-operator-role"
  principal_id                     = azurerm_kubernetes_cluster.aks.kubelet_identity.0.object_id
  skip_service_principal_aad_check = true

The main issue with this approach is that the Id of the Node resource group is known after apply when fetched by the datasource. Side effect is that the role assignments get re-created on each terraform run if the AKS cluster is modified. It’s a huge caveat for AKS upgrades because this role is needed for the network plugin I’m using (Cilium), but the role gets re-assigned AFTER the AKS gets updated, so if it’s a AKS upgrade, new nodes will come up in NotReady state until someone add the role back.

Anybody got a better approach for role assignments on Node resource group?

BYO kubelet identity, thus a Managed Identity which won’t change. You can assign the role to the Managed Identity, and assign the Managed Identity as kubelet_identity

Azure docs explain it in more detail.

That’s an excellent suggestion. I wasn’t aware BYO kubelet Identity was released on AzureRM terraform provider.
Thanks a lot for the hint!

1 Like

In general AKS features are released early, as it is a resource used by a lot of users. Feature requests are picked up quickly in the azurerm repo and sometimes already implemented in preview, so if there is anything you’d need, feel free to ask!

After trying to implement BYO kubelet ID, it doesn’t solve entirely the issue:
Terraform is rebuilding the resource because of the resource group that may have changed, not because of the kubelet Id

  # module.aks.azurerm_role_assignment.cilium_operator_role[0] must be replaced
-/+ resource "azurerm_role_assignment" "cilium_operator_role" {
      ~ id                               = "/subscriptions/<subId>/resourceGroups/myRGP/providers/Microsoft.Authorization/roleAssignments/a08dcf34-b2ae-5381-1f11-4c48e8203f2f" -> (known after apply)
      ~ name                             = "a08dcf34-b2ae-5381-1f11-4c48e8203f2f" -> (known after apply)
      ~ principal_type                   = "ServicePrincipal" -> (known after apply)
      ~ role_definition_id               = "/subscriptions/<subId>/providers/Microsoft.Authorization/roleDefinitions/f4f11236-ffd3-13fa-4734-23214717fdf2" -> (known after apply)
      ~ scope                            = "/subscriptions/<subId>/resourceGroups/myRGP-AKS-INFRA" -> (known after apply) # forces replacement
        # (3 unchanged attributes hidden)

So any changes on the AKS resource triggers a potential change on the AKS infra RGP that will be only known after apply, and so roles associated to that resource group get re-created unfortunately.

<= data "azurerm_resource_group" "rg_workers"  {
      ~ id       = "/subscriptions/<subId>/resourceGroups/myRGP" -> (known after apply)
      ~ location = "eastus" -> (known after apply)
        name     = "myRGP-AKS-INFRA"
      ~ tags     = {
          - "aks-managed-cluster-name" = "myAKS"
          - "aks-managed-cluster-rg"   = "myRGP"
        } -> (known after apply)
      + timeouts {
          + read = (known after apply)

Only workaround so far is to statically define the AKS Infra resource group name instead of relying on the data source.