Azure UserAssigned MSI and cloud auto-join

So, I decided to “improve upon” something that was otherwise functional. I had Service Principals all setup but decdied MSIs were a much better route and set about converting everything.

I am trying to get UserAssigned to work but am not having any luck. I figured out a few configs it required that aren’t that obvious in the docs, but I think I should have everything in place.

I have the retry_join as (same as it was with service principal).

"provider=azure resource_group=rg-example vm_scale_set=example-ss"

In the service environment I have this (but real):

ARM_SUBSCRIPTION_ID=00000000-00000000-00000000-00000000
ARM_CLIENT_ID=00000000-00000000-00000000-00000000

Those are set programmatically but I have verified the values that end up on the server with what shows up in the Portal for the identity.

I have also given the MSI the appropriate role actions.

With that, in the logs I get this:

[DEBUG] agent: discover-azure: using vm scale set method. resource_group: rg-example, vm_scale_set: example-ss: cluster=LAN
[ERROR] agent: Cannot discover address: cluster=LAN address="provider=azure resource_group=rg-example vm_scale_set=example-ss arm_subscription_id=00000000-00000000-00000000-00000000 arm_client_id=00000000-00000000-00000000-00000000" error="discover-azure: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/00000000-00000000-00000000-00000000/resourceGroups/rg-example/providers/microsoft.Compute/virtualMachineScaleSets/example-ss/networkInterfaces?api-version=2015-06-15: StatusCode=400 -- Original Error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Identity not found"}"
[WARN]  agent: Join cluster failed, will retry: cluster=LAN retry_interval=30s error="No servers to join"

Which essentially comes down to {"error":"invalid_request","error_description":"Identity not found"}.

The api-version=2015-06-15 bit seems odd but I don’t know much about the Azure API.

I have also tried go-discover directly with no luck.

I have been able to get SystemAssigned working with the same setup (minus ARM_CLIENT_ID) but I would rather be using UserAssigned.

Any pointers would be greatly appreciated.

FWIW, I can use az to login on the host, e.g.:

az login --identity -u /subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/rg-example/providers/Microsoft.ManagedIdentity/userAssignedIdentities/example-identity

Stepping through the code, go-discover uses Azure/go-autorest at v10.x, that version does not handle UserAssigned MSI:

func (mc MSIConfig) Authorizer() (autorest.Authorizer, error) {
	msiEndpoint, err := adal.GetMSIVMEndpoint()
	if err != nil {
		return nil, err
	}

	spToken, err := adal.NewServicePrincipalTokenFromMSI(msiEndpoint, mc.Resource)
	if err != nil {
		return nil, fmt.Errorf("failed to get oauth token from MSI: %v", err)
	}

	return autorest.NewBearerAuthorizer(spToken), nil
}

User assigned identity support was not added until v14.0.0 v12.1.0:

func (mc MSIConfig) Authorizer() (autorest.Authorizer, error) {
	msiEndpoint, err := adal.GetMSIVMEndpoint()
	if err != nil {
		return nil, err
	}

	var spToken *adal.ServicePrincipalToken
	if mc.ClientID == "" {
		spToken, err = adal.NewServicePrincipalTokenFromMSI(msiEndpoint, mc.Resource)
		if err != nil {
			return nil, fmt.Errorf("failed to get oauth token from MSI: %v", err)
		}
	} else {
		spToken, err = adal.NewServicePrincipalTokenFromMSIWithUserAssignedID(msiEndpoint, mc.Resource, mc.ClientID)
		if err != nil {
			return nil, fmt.Errorf("failed to get oauth token from MSI for user assigned identity: %v", err)
		}
	}

	return autorest.NewBearerAuthorizer(spToken), nil
}

I’ll take this to the go-discover project.

What parameter need to be passed in retry_join for using Azure MSI ?