No internet connection within Nomad exec task using bridge network mode (CNI v1.1.1)

Just switched a job from host networking to bridge for sidecar usage with Consul. The CNI bridge v1.1.1 plugin is installed. Unfortunately, I now have no network connection when executing tasks.

I’m on Ubuntu in GCP and not running firewalld, so I didn’t think any firewalld policies need be applied as specified here:

sudo firewall-cmd --zone=trusted --add-interface=nomad

UFW is also not running.

What should I try?

Hi @l-monninger; are you able to share the job specification you’re having this issue with along with any relevant configuration information regarding Nomad and Consul?

Thanks,
jrasell and the Nomad team

Here’s a redacted version of the job spec (template file). The configure-aws task is where I’ve been trying to debug.

job "tl" {

  datacenters = [
%{ for dc in datacenters ~}
"${dc}",
%{ endfor ~}
  ]

  type = "service"

  update {
    stagger      = "30s"
    max_parallel = 2
  }

  group "rpc" {

    network {
      mode = "bridge"
      port "http" {
        to = "${port}"
      }
    }

    task "configure-aws" {
      lifecycle {
        hook = "prestart"
        sidecar = false
      }

      driver = "exec"
      config {
        command = "/bin/bash"
        args = [
          "-c",
<<EOF
aws configure set aws_access_key_id ${AWS_SECRET_ID}
aws configure set aws_secret_access_key ${AWS_SECRET}
aws configure set region ${AWS_REGION}
aws configure set output json 
# ... have done network tests here and determined no internet connection available;
# not an AWS CLI specific issue
(aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS --password-stdin "${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com")
EOF
        ]
      }
    }

   task "server" {

      driver = "docker"

      config {

        image = "https://${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/noneya-beeswax:0.0.1",
        args = ["-bind", "${port}"]

      }

      env {

        // etc.
        
      }

      resources {

        # cpu    = 500 # MHz
        # memory = 256 # MB

      }

    }

    service {

      provider = "consul"
      port = "http"

      connect {
         sidecar_service {}
       }

      check {

        type = "http"
        port = "http"
        interval = "10s"
        timeout = "2s"
        path = "/health"

      }

    }

  }
}

Note: I was also trying to experiment with doing my AWS and Docker CLI auth in a slightly different way. But, without internet connection I cannot debug that part.

Client and server configurations are produced by the run-consul and run-nomad scripts recommended in the terraform-google-nomad and the terraform-google-consul repos.

Consul config

# client sc.json
{
  "ports": {
    "grpc": 8502
  }
}

# cllient default.json
{
  "advertise_addr": "10.128.0.5",
  "bind_addr": "10.128.0.5",
  "client_addr": "0.0.0.0",
  "datacenter": "<redacted>",
  "node_name": "",
  "retry_join": [
    "provider=gce project_name=dev tag_value=gcp-rpc-cluster"
  ],
  "server": false,
  "autopilot": {
    "cleanup_dead_servers": true,
    "last_contact_threshold": "200ms",
    "max_trailing_logs": 250,
    "server_stabilization_time": "10s",
    "redundancy_zone_tag": "az",
    "disable_upgrade_migration": false,
    "upgrade_version_tag": ""
  },
  "ui": false
}

I notice there isn’t a client stanza. ^

# server sc.json
{
  "ports": {
    "grpc": 8502
  },
  "connect": {
    "enabled": true
  }
}

# server default.json
{
  "advertise_addr": "10.128.0.3",
  "bind_addr": "10.128.0.3",
  "bootstrap_expect": 3,
  "client_addr": "0.0.0.0",
  "datacenter": "<redacted>",
  "node_name": "",
  "retry_join": [
    "provider=gce project_name=dev tag_value=gcp-rpc-cluster"
  ],
  "server": true,
  "autopilot": {
    "cleanup_dead_servers": true,
    "last_contact_threshold": "200ms",
    "max_trailing_logs": 250,
    "server_stabilization_time": "10s",
    "redundancy_zone_tag": "az",
    "disable_upgrade_migration": false,
    "upgrade_version_tag": ""
  },
  "ui": true
}

Best,
Liam

I get the same behavior with v1.0.0 btw.

@jrasell @primeos-work It seems like maybe I shouldn’t use the exec driver at all when in bridge mode?: Support network bridge mode · Issue #36 · hashicorp/nomad-driver-podman · GitHub

For closure, dropping the exec driver was the only thing that got this to work.

Experiencing a lack of internet connectivity within a Nomad exec task using bridge network mode (CNI v1.1.1) may stem from network configuration issues. Buy cox internet Verify that the bridge network mode is properly configured and that the task has appropriate network access permissions. Troubleshoot by checking firewall settings, DNS configurations, and network routing. Updating Nomad or the CNI plugin to the latest version may also address connectivity issues.