Restarting a job in nomad with consul connect sidecar causes the proxy to break

We’re seeing a very odd specific issue.
Consul v1.14.3
Nomad v1.4.3
Example jobspec connect configuration

    network {
      mode = "bridge"
      port "http" { to = "8080"}
      port "metrics" {}
    service {
      name = "service"
      port = "http"
      tags = ["http","addr:${NOMAD_HOST_ADDR_metrics}","prometheus"]
      meta {
        metrics_port = "${NOMAD_HOST_PORT_metrics}"
        nomad_alloc_index = "${NOMAD_ALLOC_INDEX}"
        nomad_job_name = "${NOMAD_JOB_NAME}"
      check {
        type     = "http"
        path     = "/ping"
        interval = "10s"
        timeout  = "2s"
      connect {
        sidecar_service {
          tags = ["service"]
          proxy {
            expose {
              path {
                path            = "/metrics"
                protocol        = "http"
                local_path_port = 8080
                listener_port   = "metrics"

When deploying for the first time or with a new job spec this works exactly as expected exposing the /metrics endpoint.

However when the job gets restarted (through oom, reboot, manual start/stop) the /metrics endpoint fails to be exposed.
We get Connection Refused on the /metrics endpoint and Connection reset on the sidecar proxy.
I cannot find any errors in reference to this in nomad, consul or even the envoy proxy pods.

To “fix” the issue simple redeploying with an updated spec fixed the issue. Is there some difference in a job that could break from a restart vs redeployment?


We have now found when the job is restarted/rebooted etc. The jobspec looses the expose stanza.

Filed a bug with findings

Hi @dpewsey,

Thanks for raising the issue. It looks like this fix has been merged into the release branches, and will therefore be available in the next release.

jrasell and the Nomad team