Hi
I’ve been trying to get the ingress gateway working - whilst I had it working on a simple vagrant cluster of machines with no ACLS, when I moved to DigitalOcean with ACLs enabled on consul, it no longer works
nomad = 1.1.2
consul = 1.10.0
> nomad operator raft list-peers
Node ID Address State Voter RaftProtocol
nomad-3.global 10.106.0.10:4647 10.106.0.10:4647 follower true 2
nomad-2.global 10.106.0.4:4647 10.106.0.4:4647 leader true 2
nomad-1.global 10.106.0.7:4647 10.106.0.7:4647 follower true 2
> consul operator raft list-peers
Node ID Address State Voter RaftProtocol
consul-3 123869d6-ee08-be85-0a37-e07c084ea952 10.106.0.9:8300 follower true 3
consul-1 475c3fbc-ff47-a4d5-561a-57fa0014da21 10.106.0.2:8300 leader true 3
consul-2 b0ff6bfe-5b8d-b20a-37c3-9bc21f9bafd6 10.106.0.12:8300 follower true 3
Consul has ACLs enabled and mTLS with a unique token provided for each nomad server/client node
> cat /etc/nomad.d/consul.hcl
consul {
token = "some token"
ca_file = "/opt/consul/tls/ca.crt"
}
and the permissions are
acl = "write"
agent_prefix "" {
policy = "read"
}
agent "the-host" {
policy = "write"
}
node_prefix "" {
policy = "read"
}
node "the-host" {
policy = "write"
}
service_prefix "" {
policy = "write"
}
key_prefix "" {
policy = "read"
}
session "the-host" {
policy = "write"
}
Here’s the job file I used
> cat uuid.hcl
job "ig-bridge-demo" {
datacenters = ["dc1"]
group "ingress-group" {
network {
mode = "bridge"
port "api" {
static = 8080
to = 8080
}
}
service {
name = "my-ingress-service"
port = "8080"
connect {
gateway {
proxy {
}
ingress {
listener {
port = 8080
protocol = "tcp"
service {
name = "uuid-api"
}
}
}
}
}
}
}
group "generator" {
network {
mode = "host"
port "api" {}
}
service {
name = "uuid-api"
port = "${NOMAD_PORT_api}"
connect {
native = true
}
}
task "generate" {
driver = "docker"
config {
image = "hashicorpnomad/uuid-api:v3"
network_mode = "host"
}
env {
BIND = "0.0.0.0"
PORT = "${NOMAD_PORT_api}"
}
}
}
}
I added in the intentions (consul intention create my-ingress-service uuid-api) to link it up but get the following error
> nomad job run -verbose uuid.hcl
Error submitting job: Unexpected response code: 500 (rpc error: Unexpected response code: 403 (rpc error making call: rpc error making call: Permission denied))
I even tried it by passing in the bootstrap ACL token via -consul-token and setting full wildcard intentions but get the same error
When monitor the nomad logs the leader has nothing interesting and the machine the command is run on has a simple error message
LEADER > nomad monitor -log-level TRACE
2021-07-09T08:42:56.875Z [TRACE] nomad.job: job mutate results: mutator=canonicalize warnings=[] error=<nil>
2021-07-09T08:42:56.875Z [TRACE] nomad.job: job mutate results: mutator=connect warnings=[] error=<nil>
2021-07-09T08:42:56.875Z [TRACE] nomad.job: job mutate results: mutator=expose-check warnings=[] error=<nil>
2021-07-09T08:42:56.875Z [TRACE] nomad.job: job mutate results: mutator=constraints warnings=[] error=<nil>
2021-07-09T08:42:56.875Z [TRACE] nomad.job: job validate results: validator=connect warnings=[] error=<nil>
2021-07-09T08:42:56.875Z [TRACE] nomad.job: job validate results: validator=expose-check warnings=[] error=<nil>
2021-07-09T08:42:56.875Z [TRACE] nomad.job: job validate results: validator=validate warnings=[] error=<nil>
2021-07-09T08:42:56.875Z [TRACE] nomad.job: job validate results: validator=memory_oversubscription warnings=[] error=<nil>
FOLLOWER > nomad monitor -log-level TRACE
2021-07-09T08:42:56.882Z [ERROR] http: request failed: method=POST path=/v1/jobs error="rpc error: Unexpected response code: 403 (rpc error making call: rpc error making call: Permission denied)" code=500
2021-07-09T08:42:56.882Z [DEBUG] http: request complete: method=POST path=/v1/jobs duration=8.718807ms
Get the same error via the HTTP API
> curl -v --request POST --data @uuid.json \
--cacert /opt/nomad/tls/ca.crt \
--cert /opt/nomad/tls/agent.crt \
--key /opt/nomad/tls/agent.key \
"https://10.106.0.7:4646/v1/jobs"
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying 10.106.0.7:4646...
* Connected to 10.106.0.7 (10.106.0.7) port 4646 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /opt/nomad/tls/ca.crt
* CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, CERT verify (15):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: CN=server.global.nomad
* start date: Jul 8 10:42:08 2021 GMT
* expire date: Aug 7 10:42:38 2021 GMT
* subjectAltName: host "10.106.0.7" matched cert's IP address!
* issuer: CN=global.nomad Intermediate Authority
* SSL certificate verify ok.
> POST /v1/jobs HTTP/1.1
> Host: 10.106.0.7:4646
> User-Agent: curl/7.76.1
> Accept: */*
> Content-Length: 8019
> Content-Type: application/x-www-form-urlencoded
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Mark bundle as not supporting multiuse
< HTTP/1.1 500 Internal Server Error
< Vary: Accept-Encoding
< Date: Fri, 09 Jul 2021 08:42:56 GMT
< Content-Length: 106
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 10.106.0.7 left intact
rpc error: Unexpected response code: 403 (rpc error making call: rpc error making call: Permission denied)[root@nomad-1 jobs]# nomad members
The cert/keys are generate from the pki endpoint of a vault cluster
I also deployed the countdash example connect job and that works fine so connect seems to be working ok (envoy 1.18.3)
Any ideas on what config I’m missing to get ingress working?