Running packer inside a Docker container

Largely learning about Packer/VMs on the job, so please bear with me if you see obvious blunders or bad assumptions!

I’ve been using Packer and VMware Fusion pro to build images, but a recent switch to the new ARM based Macs has meant that I need to find a new way to build x86 images, so I’m attempting to move the build off of local machines and into CI (with GitHub actions).

The default runners for GitHub Actions on don’t allow for nested virtualization, so I’m using a bare metal Linux EC2 instance as a custom runner.

We handle the various build steps before Packer using a Docker container, so I’ve installed (and licensed) VMware Workstation and Packer in the Docker image, but I’m having trouble getting through the first Packer build.

Here’s the template:

source "vmware-iso" "boot" {
  guest_os_type       = "ubuntu-64"
  headless            = true
  http_directory      = "http"
  iso_checksum        = "7d8e0055d663bffa27c1718685085626cb59346e7626ba3d3f476322271f573e"
  iso_url             = "http://cdimage.ubuntu.com/releases/18.04.3/release/ubuntu-18.04.3-server-amd64.iso"
  output_directory    = "../../images/boot"
  shutdown_command    = "echo 'vagrant' | sudo -S shutdown -P now"
  skip_compaction     = true
  ssh_password        = "vagrant"
  ssh_username        = "vagrant"
  ssh_wait_timeout    = "20m"
  tools_upload_flavor = "linux"
  vm_name             = "foo-enterprise"
  vmdk_name           = "foo-enterprise"
  cpus                = 1
  memory              = 1024
  cores               = 1
  boot_wait           = "30s"
  boot_command        = [
    "<enter><wait>",
    "<f6><esc>",
    "<bs><bs><bs><bs><bs><bs><bs><bs><bs><bs>",
    "<bs><bs><bs><bs><bs><bs><bs><bs><bs><bs>",
    "<bs><bs><bs><bs><bs><bs><bs><bs><bs><bs>",
    "<bs><bs><bs><bs><bs><bs><bs><bs><bs><bs>",
    "<bs><bs><bs><bs><bs><bs><bs><bs><bs><bs>",
    "<bs><bs><bs><bs><bs><bs><bs><bs><bs><bs>",
    "<bs><bs><bs><bs><bs><bs><bs><bs><bs><bs>",
    "<bs><bs><bs><bs><bs><bs><bs><bs><bs><bs>",
    "<bs><bs><bs>",
    "/install/vmlinuz noapic preseed/url=http://{{ .HTTPIP }}:{{ .HTTPPort }}/preseed.cfg ",
    "initrd=/install/initrd.gz ",
    "auto-install/enable=true ",
    "debconf/priority=critical ",
    "<enter>"
  ]
}

build {
  sources = ["source.vmware-iso.boot"]

  provisioner "file" {
    destination = "/tmp"
    source      = "scripts/"
  }

  provisioner "shell" {
    inline = ["cd /tmp && echo 'vagrant' | sudo -E -S bash -e setup.sh || exit 1"]
  }
}

The setup.sh script installs the VMware tools (among other things).

I’ve managed to get packer build to work in isolation when running outside of the Docker container in a Linux environment with the same distro/version as the container, but when it runs inside the container, it always hits the SSH timeout, showing the following error when logging is enabled.

2021/11/01 18:45:00 packer-builder-vmware-iso plugin: [INFO] Waiting for SSH, up to timeout: 20m0s
2021/11/01 18:45:00 packer-builder-vmware-iso plugin: Located networking configuration file using Workstation: /etc/vmware/networking
==> vmware-iso.boot: Waiting for SSH to become available...
2021/11/01 18:45:00 packer-builder-vmware-iso plugin: GuestIP discovered device matching hostonly: vmnet1
2021/11/01 18:45:00 packer-builder-vmware-iso plugin: Lookup up IP information...
2021/11/01 18:45:00 packer-builder-vmware-iso plugin: GuestAddress found MAC address in VMX: 00:0c:29:4e:6b:73
2021/11/01 18:45:00 packer-builder-vmware-iso plugin: Trying DHCP leases path: /etc/vmware/vmnet1/dhcpd/dhcpd.leases
2021/11/01 18:45:00 packer-builder-vmware-iso plugin: Unable to find an exact match for DHCP lease. Falling back to a loose match for hw address 00:0c:29:4e:6b:73
2021/11/01 18:45:00 packer-builder-vmware-iso plugin: IP lookup failed: None of the found device(s) [vmnet1] has a DHCP lease for MAC 00:0c:29:4e:6b:73
2021/11/01 18:45:00 packer-builder-vmware-iso plugin: [DEBUG] Error getting SSH address: IP lookup failed: None of the found device(s) [vmnet1] has a DHCP lease for MAC 00:0c:29:4e:6b:73

/etc/vmware/vmnet1/dhcpd/dhcpd.leases is just an empty file. I’ve also tried running it with network = "hostonly" and network = "bridged" but no joy with those either.

Here are the logs from running PACKER_LOG=1 packer build template.pkr.hcl up to the SSH retry loop starts.

Before this stage, Packer appears to connect to the VM and run the boot command over VNC, but when even with all ports 5900-6000 exposed through the container, I’m not able to connect to the VNC session from outside Docker. Just appears that there’s nothing actually running on localhost:59XX when I create an SSH tunnel there.

Running vmrun list shows that there is 1 VM running, but the other vmrun commands don’t seem to work.

  • vmrun checkToolsState responds with unknown
  • vmrun listProcessesInGuest hangs

Initially I had the same problem when running outside Docker, but starting VMware Workstation with a GUI showed that it was hanging, waiting for a license. Those problems all went away when I added the license to the Dockerfile as part of building the image.

The docker container is also running with the --privileged flag, which—as I understand it—will allow the Docker container to access the necessary host capabilities for virtualization.

I can provide an approximation of the Dockerfile if necessary, but here’s the reference I based it on, in case that’s enough: Re: Is it possible to run VMware Workstation insid... - VMware Technology Network VMTN. The only major difference being that I removed Vagrant from the build process for now to simplify things.

Here’s the corresponding vmware.log file, but I can’t see any obvious failures/errors there, which makes me think it’s more to do with the specifics of Packer/VMware inside Docker, than with the Packer configuration.

The combination of Packer not being able to connect via SSH and VNC not being accessible over the port that Packer is suggesting feels like a networking problem, but once inside a Docker container, there are too many unknown unknowns for me to know where to look and what to tweak.

I know that running this stuff inside a container is arguably just asking for trouble, but it seems a lot simpler than having the first half of the build run inside a container, then to switch over to a separate Packer provisioned AMI to do the next set of steps. However, that seems to be the option if I still can’t make progress. The other alternative would be to try using something like VirtualBox instead, just in case there’s some fundamental incompatibility between VMware and Docker.

Any advice or pointers on things to explore next is appreciated!