Cloud_init example for Azure provider please

I’ve seen simple script provisioning using metadata_startup_script for GCP, and cloud_init for aws.

However, I’m yet able to make either example work for Azure.

Can someone provide a simple example for Azure provider to use cloud_init to do VM initialization please?

Hi @ats,

I have shared some examples for those and so maybe the ones I’ve shared are the ones you’ve seen, but I don’t personally have any significant Azure experience so I don’t know what is the Azure equivalent of user_data in EC2, but I assume there is a similar “user data” or “metadata” feature there too, which could follow the same pattern.

If you can share some examples you tried already I could perhaps try to guess what made them not work as you hoped, and then maybe we can find a working example together.

Ah, thanks @apparentlymart, I found for Azure, it is not user_data but custom_data. However my success was only half way.

Here is the cloud-init script that I have:

#cloud-config
# Add groups to the system
# Adds the ubuntu group with members 'root' and 'sys'
# and the empty group hashicorp.
groups:
  - ubuntu: [root,sys]
  - hashicorp

# Add users to the system. Users are added after groups are added.
users:
  - default
  - name: terraform
    gecos: terraform
    shell: /bin/bash
    primary_group: hashicorp
    sudo: ALL=(ALL) NOPASSWD:ALL
    groups: users, admin
    lock_passwd: false
    ssh_authorized_keys:
      - ssh-rsa AAAAHHHHHH

# Downloads the golang package
packages:
  - golang-go

# Sets the GOPATH & downloads the demo payload
runcmd:
  - sudo su terraform
  - sudo mkdir /home/terraform/go
  - sudo chown terraform:hashicorp /home/terraform/go
  - export GOPATH=/home/terraform/go
  - go get github.com/hashicorp/learn-go-webapp-demo

I have verified that it works half the way – I got the terraform user created successfully, however golang-go package was not installed.

What could be the problem?
Any way I can troubleshoot it? – during terraform apply I didn’t see anything mentioned about cloud-init, even after I have enabled TF_LOG=debug.

UPDATE 1:

I remember reading somewhere that Azure is not waiting for cloud-init to finish. Maybe that’s why the quick ones are done but golang-go package was not installed.

UPDATE 2:

I’m almost certain that Azure is not waiting for cloud-init to finish and only got the quick ones done, as I have proved the following tasks are done under runcmd

  - sudo su terraform
  - sudo mkdir /home/terraform/go
  - sudo chown terraform:hashicorp /home/terraform/go

Hi @ats!

The good news is that if you see cloud-init taking any of these actions at all then that confirms that you’ve already solved the Terraform part of this problem: Terraform will just send the data you specified verbatim, and so it’s up to cloud-init to interpret it.

The less good news is that this is now more of a cloud-init question than it is a Terraform question, and so I’m not so well equipped to help you debug as I’d hoped. :confounded:

Cloud-init is usually set up to write logs about its activities somewhere on the filesystem of the virtual machine, but the exact location varies depending on which operating system or distribution you are using, and so I don’t have an exact path memorized. If you look under /var/log and its subdirectories you can hopefully find something with cloud-init in the name.

One initial idea I have is that cloud-init might be running these steps in a different order than you expected. I don’t recall the priority order for the modules but I believe there is a particular processing order for different module types and so cloud-init might be handling runcmd before it handles packages, in which case your package would not be installed yet when the script runs.

The relevant documentation here is Boot Stages, which describes the different phases of work for cloud-init during startup. However, I do remember a “gotcha” with the information on that page: the runcmd module runs during “Config”, but all it actually does is register a script to be executed later. The actual script execution happens during “final”, which is also the phase for package installation. I’m not sure how cloud-init decides on a priority order for two modules that execute during the same phase.

Thanks a lot for all your help @apparentlymart.

I’m going to try cloud-init directly with az cli. If that works as expected, would the ball get back to the Terraform court again, :smile: ?

hmm… I’m afraid that the ball is back to the Terraform court again, as I tred cloud-init directly with az cli using the same yaml file and it works perfectly. Here is its log (from /var/log/cloud-init-output.log):

When I use cloud-init with terraform, by putting this cloud-init script file and its supporting mechanism into this example, while substituting user_data with custom_data, this is what I’m getting:

Reading package lists...
E: Could not get lock /var/lib/apt/lists/lock. It is held by process 803 (apt-get)
E: Unable to lock directory /var/lib/apt/lists/
2022-06-17 22:18:41,339 - util.py[WARNING]: Package update failed
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package golang-go

az cli used to have the same problem before. I reported it to MS several years ago, and it seems to have been fixed. But now terraform is suffering from it.

Somebody needs to look into it please. Full log posted at

Hi @ats,

I feel you are being rather premature with your claim that

Fundamentally, the design of cloud-init is that a data file is passed to the VM provider during the creation request, and this is retrieved by cloud-init during the the VM’s first boot.

As soon as the data file has been passed to the VM provider, Terraform has no more involvement in the process.

Your successful and failing logs feature different major versions of cloud-init. This seems far more likely to be relevant, and you’d have to run the Terraform and non-Terraform experiments using the exact same base image, for this to be a reasonable comparison.

And it will work.

What I did extra was to replace UbuntuServer 18.04-LTS with debian-11 and that causes the the above reported problem. I’ve reverted the OS change and proved that it works again.

So, what’s remaining is, why the same cloud-init script file works with UbuntuServer 18.04-LTS but not debian-11? This might be a total different issue, but let me just report it here.

Update:

OK then @maxb, if you think it is not Terraform’s problem, then it is not then.

Case closed.

A bit of Googling turned up:

https://github.com/MicrosoftDocs/azure-docs/issues/82500
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1001477

i.e. it’s a known bug in Azure’s Debian images.