Packer hangs / times out when customizing a Keycloak image, but only when starting from KC images >= 20.0.5?

Hello everyone,

Has anyone ever come across something like this? I have a packer recipe that builds a custom Keycloak image in order to inject the correct theme and run build optimization.

This worked fine building from keycloak:19.x and 20:x images, but starting from keycloak:20.0.5 and continuing to keycloak:21.x, the build seems to “hang” after running the last provisioner and never commits the image, instead timing out after 5 or so minutes.

Example recipe, paraphrased:

source "docker" "keycloak" {
  image  = "quay.io/keycloak/keycloak:20.0.3"
  commit = true
  changes = [
   "ENTRYPOINT [\"/opt/keycloak/bin/kc.sh\"]",
  ]
}

build {
  name = "Include theme and Optimize"
  sources = [
    "source.docker.keycloak"
  ]
  provisioner "shell-local" {
    inline = [
        "rm -fr _kc_themes",
        "git clone <redacted> _kc_themes"
    ]
  }
  provisioner "file" {
    source = "_kc_themes/"   # This must exist first...
    destination = "/opt/keycloak/themes/"
  }
  provisioner "shell" {
    environment_vars = [
        "KC_HEALTH_ENABLED=true",
        "KC_METRICS_ENABLED=true",
        "KC_DB=postgres",
        "KC_PROXY=edge",
    ]
    inline = [""]
    execute_command = "{{.Vars}} /opt/keycloak/bin/kc.sh build && /opt/keycloak/bin/kc.sh show-config && echo BUILD DONE"
  }

  post-processor "docker-tag" {
    repository = "<redacted>"
    tags       = ["redacted-tag>"]
    only       = ["docker.keycloak"]
  }
}

I’ve used execute_command rather than inline in order to work around permission constraints inside the image.

This recipe works for every version of 19.x I tested and for 20.0.x (x=0…3): after the “BUILD DONE” echo, the image commits and tags as expected.

However, bump the image to quay.io/keycloak/keycloak:20.0.5 or later, and after the BUILD DONE, packer just seems to hang: there is no further log output whatsoever (including when running with -debug flag). Eventually it times out there.

Following are things I’ve tried (not necessarily in order) which has not changed this behaviour:

  • replaced commit=true with discard=true
  • Confirmed docker plugin is at latest 1.0.8
  • Updated packer 1.8.4 → 1.8.6
  • Updated docker 20.10.8 → 20.10.24
  • overriding the default run_command variously to inject tini --init or to use bash instead of sh
  • trying to force explicit exit of the sh in execute_command both with and without --init injected.

I have also tried execing into the running build container at the BUILD DONE point, and here the behaviour seems quite interesting: the sh still appears to be running? Unfortunately this image does not contain ps, nor any package manager, nor curl, so resorted to /proc/<pid> to poke around. When not injecting --init, killing the sh on pid 1 did not seem to cause the container to exit in a guaranteed fashion, however when injecting --init and killing sh (which seems to land on pid 7), the container does exit. In either case this doesn’t seem to have any impact, packer just remains stuck at that point.

The release notes for KC21.0.0 mention a number of changes to the images: primarily moving from UBI8-minimal to UBI9-micro as a base, and removing curl and a variety of other cli commands.

Looking at the preceding versions, building from KC20.0.5 has the exact same issue, but its release notes mention no image-related changes, only code bug fixes. 20.0.4 doesn’t exist and 20.0.3 builds fine.

Wondering if this behaviour happens to be familiar to anyone? Any suggestions gratefully received.

I’d forgotten I can run with PACKER_LOG=1 and this does reveal some useful info: The “BUILD DONE” log is followed immediately by:

2023/04/05 16:09:08 packer-plugin-docker_v1.0.8_x5.0_darwin_amd64 plugin: 2023/04/05 16:09:08 [INFO] RPC endpoint: Communicator ended with: 0
2023/04/05 16:09:08 [INFO] 4232 bytes written for 'stdout'
2023/04/05 16:09:08 [INFO] 0 bytes written for 'stderr'
2023/04/05 16:09:08 [INFO] RPC client: Communicator ended with: 0
2023/04/05 16:09:08 [INFO] RPC endpoint: Communicator ended with: 0
2023/04/05 16:09:08 packer-provisioner-shell plugin: [INFO] 4232 bytes written for 'stdout'
2023/04/05 16:09:08 packer-provisioner-shell plugin: [INFO] 0 bytes written for 'stderr'
2023/04/05 16:09:08 packer-provisioner-shell plugin: [INFO] RPC client: Communicator ended with: 0
2023/04/05 16:09:08 packer-plugin-docker_v1.0.8_x5.0_darwin_amd64 plugin: 2023/04/05 16:09:08 Executing docker exec -i c57236bb9e4e7dd7a4300120813bc5d959a00d765107010a46ba7a846d9a732e /bin/sh -c (rm -f /tmp/script_429.sh):
2023/04/05 16:09:09 packer-plugin-docker_v1.0.8_x5.0_darwin_amd64 plugin: 2023/04/05 16:09:09 [INFO] RPC endpoint: Communicator ended with: 1
2023/04/05 16:09:09 [INFO] RPC client: Communicator ended with: 1
2023/04/05 16:09:09 [INFO] RPC endpoint: Communicator ended with: 1
2023/04/05 16:09:09 packer-provisioner-shell plugin: [INFO] RPC client: Communicator ended with: 1
2023/04/05 16:09:09 packer-provisioner-shell plugin: Retryable error: Error removing temporary script at /tmp/script_429.sh!
2023/04/05 16:09:11 packer-plugin-docker_v1.0.8_x5.0_darwin_amd64 plugin: 2023/04/05 16:09:11 Executing docker exec -i c57236bb9e4e7dd7a4300120813bc5d959a00d765107010a46ba7a846d9a732e /bin/sh -c (rm -f /tmp/script_429.sh):
2023/04/05 16:09:11 packer-plugin-docker_v1.0.8_x5.0_darwin_amd64 plugin: 2023/04/05 16:09:11 [INFO] RPC endpoint: Communicator ended with: 1
2023/04/05 16:09:11 [INFO] RPC client: Communicator ended with: 1
2023/04/05 16:09:11 [INFO] RPC endpoint: Communicator ended with: 1
2023/04/05 16:09:11 packer-provisioner-shell plugin: [INFO] RPC client: Communicator ended with: 1
2023/04/05 16:09:11 packer-provisioner-shell plugin: Retryable error: Error removing temporary script at /tmp/script_429.sh!

(after which it’s just continually retrying that script removal until it times out after about 5 min)

So the hang is actually caused by an apparent permission issue in removing a temp script (which is a variation of the issue leading me to avoid inline in the first place).

Solution appears to be simply adding fix_upload_owner = true for in the source "docker" stanza. The docs for the docker plugin seem to indicate this is the default, but that appears is not necessarily the case.