Packer provisioner returning before sysprep has completed, resulting in bad image

We’ve been using a powershell provisioner like what’s documented here for years. We’re now trying to build an image from the markeplace Windows Server 2022 Datacenter and we’re seeing Packer moves on from this provisioner to capturing the image before the build instance has reached the reseal oobe state.

We’re using Packer v1.12.0, with Azure plugin ~> 2.3.

Our builder image filter is set as

      "image_publisher": "MicrosoftWindowsServer",
      "image_offer": "WindowsServer",
      "image_sku": "2022-Datacenter",

Our provisioner is defined as

    {
      "type": "powershell",
      "inline": [
        "write-output 'Preparing for sysprep'",
        "foreach ($service in Get-Service -Name RdAgent, WindowsAzureTelemetryService, WindowsAzureGuestAgent -ErrorAction SilentlyContinue) { while ((Get-Service $service.Name).Status -ne 'Running') { Start-Sleep -s 5 } }",
        "Remove-ItemProperty -Path 'HKLM:\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Setup\\SysPrepExternal\\Generalize' -Name '*'",
        "write-output 'Running sysprep'",
        "if( Test-Path $Env:SystemRoot\\windows\\system32\\Sysprep\\unattend.xml ){ rm $Env:SystemRoot\\windows\\system32\\Sysprep\\unattend.xml -Force}",
        "& $env:SystemRoot\\System32\\Sysprep\\Sysprep.exe /oobe /generalize /quiet /quit",
        "while($true) { $imageState = Get-ItemProperty HKLM:\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Setup\\State | Select ImageState; Write-Output $imageState.ImageState; if($imageState.ImageState -ne 'IMAGE_STATE_GENERALIZE_RESEAL_TO_OOBE') { Start-Sleep -s 10  } else { break } }"
      ],
      "timeout": "20m",
      "only": [
        "azure-arm"
      ]
    },

In the build output I see this

==> azure-arm: Provisioning with Powershell...
==> azure-arm: Provisioning with powershell script: /tmp/powershell-provisioner1922629977
    azure-arm: Preparing for sysprep
    azure-arm: Running sysprep
    azure-arm: IMAGE_STATE_COMPLETE
    azure-arm: IMAGE_STATE_UNDEPLOYABLE
==> azure-arm: Querying the machine's properties ...

Then in the test stage of our build when we deploy an instance from the image we get this

Error creating or updating virtual machine test-win22-245 - (OSProvisioningClientError) OS Provisioning for VM 'test-win22-245' did not finish in the allotted time. However, the VM guest agent was detected running. This suggests the guest OS has not been properly prepared to be used as a VM image (with CreateOption=FromImage). To resolve this issue, either use the VHD as is with CreateOption=Attach or prepare it properly for use as an image

It shouldn’t be getting to the Querying the machine’s properties before outputing IMAGE_STATE_GENERALIZE_RESEAL_TO_OOBE unless it hits the timeout, and in that situation it should be reporting an error rather than capturing an image.

You see no output because of the script, just add this before the “break” and you will see what you want:
{ Write-Host $imageState.ImageState - SysPrep Completed!; Break }

But, of course…the bug will always be there.

Do you found something new about that ?

We did not figure out the exact cause, but it is related to CIS hardening. We use the ansible-lockdown role, disabling it entirely solves the issue but we have harden. I asked on their GitHub page which control(s) might be related and they basically said “out of scope” and “you can buy support”. But upgrading to the latest version of that role (from 2.x to 3.x) seems to have resolved it.

ps - The script is already writing out the state before the if that checks the state and either sleeps or breaks. So I should be seeing it print out ** IMAGE_STATE_GENERALIZE_RESEAL_TO_OOBE** before the script breaks, as long as the sysprep actually reaches that state.