No internet connection within Nomad exec task using bridge network mode (CNI v1.1.1)

Just switched a job from host networking to bridge for sidecar usage with Consul. The CNI bridge v1.1.1 plugin is installed. Unfortunately, I now have no network connection when executing tasks.

I’m on Ubuntu in GCP and not running firewalld, so I didn’t think any firewalld policies need be applied as specified here:

sudo firewall-cmd --zone=trusted --add-interface=nomad

UFW is also not running.

What should I try?

Hi @l-monninger; are you able to share the job specification you’re having this issue with along with any relevant configuration information regarding Nomad and Consul?

Thanks,
jrasell and the Nomad team

Here’s a redacted version of the job spec (template file). The configure-aws task is where I’ve been trying to debug.

job "tl" {

  datacenters = [
%{ for dc in datacenters ~}
"${dc}",
%{ endfor ~}
  ]

  type = "service"

  update {
    stagger      = "30s"
    max_parallel = 2
  }

  group "rpc" {

    network {
      mode = "bridge"
      port "http" {
        to = "${port}"
      }
    }

    task "configure-aws" {
      lifecycle {
        hook = "prestart"
        sidecar = false
      }

      driver = "exec"
      config {
        command = "/bin/bash"
        args = [
          "-c",
<<EOF
aws configure set aws_access_key_id ${AWS_SECRET_ID}
aws configure set aws_secret_access_key ${AWS_SECRET}
aws configure set region ${AWS_REGION}
aws configure set output json 
# ... have done network tests here and determined no internet connection available;
# not an AWS CLI specific issue
(aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS --password-stdin "${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com")
EOF
        ]
      }
    }

   task "server" {

      driver = "docker"

      config {

        image = "https://${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/noneya-beeswax:0.0.1",
        args = ["-bind", "${port}"]

      }

      env {

        // etc.
        
      }

      resources {

        # cpu    = 500 # MHz
        # memory = 256 # MB

      }

    }

    service {

      provider = "consul"
      port = "http"

      connect {
         sidecar_service {}
       }

      check {

        type = "http"
        port = "http"
        interval = "10s"
        timeout = "2s"
        path = "/health"

      }

    }

  }
}

Note: I was also trying to experiment with doing my AWS and Docker CLI auth in a slightly different way. But, without internet connection I cannot debug that part.

Client and server configurations are produced by the run-consul and run-nomad scripts recommended in the terraform-google-nomad and the terraform-google-consul repos.

Consul config

# client sc.json
{
  "ports": {
    "grpc": 8502
  }
}

# cllient default.json
{
  "advertise_addr": "10.128.0.5",
  "bind_addr": "10.128.0.5",
  "client_addr": "0.0.0.0",
  "datacenter": "<redacted>",
  "node_name": "",
  "retry_join": [
    "provider=gce project_name=dev tag_value=gcp-rpc-cluster"
  ],
  "server": false,
  "autopilot": {
    "cleanup_dead_servers": true,
    "last_contact_threshold": "200ms",
    "max_trailing_logs": 250,
    "server_stabilization_time": "10s",
    "redundancy_zone_tag": "az",
    "disable_upgrade_migration": false,
    "upgrade_version_tag": ""
  },
  "ui": false
}

I notice there isn’t a client stanza. ^

# server sc.json
{
  "ports": {
    "grpc": 8502
  },
  "connect": {
    "enabled": true
  }
}

# server default.json
{
  "advertise_addr": "10.128.0.3",
  "bind_addr": "10.128.0.3",
  "bootstrap_expect": 3,
  "client_addr": "0.0.0.0",
  "datacenter": "<redacted>",
  "node_name": "",
  "retry_join": [
    "provider=gce project_name=dev tag_value=gcp-rpc-cluster"
  ],
  "server": true,
  "autopilot": {
    "cleanup_dead_servers": true,
    "last_contact_threshold": "200ms",
    "max_trailing_logs": 250,
    "server_stabilization_time": "10s",
    "redundancy_zone_tag": "az",
    "disable_upgrade_migration": false,
    "upgrade_version_tag": ""
  },
  "ui": true
}

Best,
Liam

I get the same behavior with v1.0.0 btw.

@jrasell @primeos-work It seems like maybe I shouldn’t use the exec driver at all when in bridge mode?: Support network bridge mode · Issue #36 · hashicorp/nomad-driver-podman · GitHub

For closure, dropping the exec driver was the only thing that got this to work.