Hi all,
I’ve been pulling my hair out trying to get this to work for the past week and would hugely appreciate any insights into what I’m doing wrong.
I’ve got a Consul + Nomad cluster that looks like the following:
Some notes about the network:
All Consul and Nomad servers and agents are on the same Tailscale network (not sure if that makes a difference)
Nomad does not have CA or encryption keys configured but the CNI plugin is installed on all Nomad clients.
Consul does not have ACLs set up, but CA and encryption keys are configured.
Envoy is not installed on any node
I wanted to create an ingress gateway per-datacenter, so I used this example Nomad service configuration file: nomad-connect-examples/ig-bridge-demo.nomad at master · hashicorp/nomad-connect-examples · GitHub with the following addition to ensure the service ran on Linux nodes:
# Only run on Debian nodes
constraint {
attribute = "${attr.kernel.name}"
value = "linux"
}
It seemed to work the first time, correctly acting as a reverse proxy on uuid-api.ingress.dc1.consul
.
However, I wanted the ingress gateway to act as a reverse proxy for all services hosted in the datacenter, and reading the documentation
led me to believe that I could proxy all services by using name = *
and protocol = http
instead of tcp
.
I updated my ingress configuration to look as follows:
# Consul Ingress Gateway Configuration Entry.
ingress {
# Nomad will automatically manage the Configuration Entry in Consul
# given the parameters in the ingress block.
#
# Additional options are documented at
# https://www.nomadproject.io/docs/job-specification/gateway#ingress-parameters
listener {
port = 8080
protocol = "http"
service {
name = "uuid-api"
}
}
}
And then tried to redeploy but I got this error:
Error writing config entry ingress-gateway/ingress-ngproxy: Unexpected response code: 500 (rpc error making call: service "count-dashboard" has protocol "tcp", which does not match defined listener protocol "http")
This led to me trying to write a new config for proxy-defaults
and writing it with consul config write
:
{
"Kind": "proxy-defaults",
"Name": "global",
"Config": {
"Protocol": "http"
}
}
After doing this and re-deploying services, the ingress gateway ceased to work. Reverting the configuration didn’t help either, nor did restarting Consul/Nomad or the server/clients. When adding the job, the following line sticks out:
Feb 13 23:24:45 gamma-compute consul[48760]: 2023-02-13T23:24:45.208Z [ERROR] agent.proxycfg: Failed to handle update from watch: kind=ingress-gateway proxy=_nomad-task-08d70a6e-3a53-6b54-c267-bcf828a32ad8-group-ingress-group-my-ingress-service-8080 service_id=_nomad-task-08d70a6e-3a53-6b54-c267-bcf828a32ad8-group-ingress-group-my-ingress-service-8080 id=gateway-config error="invalid type for config entry: <nil>"
I’ve since also tried writing service-defaults
:
{
"Kind": "service-defaults",
"Name": "uuid-api",
"Protocol": "http"
}
But Nomad just seems to straight up ignore the config, with the following error:
Error submitting job: Unexpected response code: 500 (rpc error: rpc error: Unexpected response code: 500 (service "uuid-api" has protocol "tcp", which does not match defined listener protocol "http"))
If I deploy the uuid-api
service separately, it doesn’t seem to work at all, even if I try to access it on the host-allocated port instead of the reverse proxy.
I’m feeling a bit lost, and not sure where to go or what else to try. Any help would be appreciated
Possibly related GitHub issues and PRs:
opened 04:20AM - 12 Aug 20 UTC
type/enhancement
theme/networking
theme/consul/connect
stage/accepted
Nomad 0.11.1
Consul 1.8.2
Consul Ingress-Gateways support `tcp` and `h… ttp` listeners. Http listeners are preferred because they allow for multiple services to listen on a single port and use Host header identification.
## Problem
Nomad jobs default to service type of `tcp`. There does not appear to be a documented way to change a nomad job to use `http` as the service type. As a result the user will get the following error when they attempt to create a listener for it.
https://www.nomadproject.io/docs/job-specification/service
```
Error writing config entry ingress-gateway/ingress-ngproxy: Unexpected response code: 500 (rpc error making call: service "count-dashboard" has protocol "tcp", which does not match defined listener protocol "http")
```
## Steps to reproduce
1. Submit the standard count-dash example
<details><summary>count-dash.job</summary>
<p>
```
job "countdash" {
datacenters = ["dc1"]
group "api" {
network {
mode = "bridge"
}
service {
name = "count-api"
port = "9001"
connect {
sidecar_service {}
}
}
task "web" {
driver = "docker"
config {
image = "hashicorpnomad/counter-api:v1"
}
}
}
group "dashboard" {
network {
mode ="bridge"
port "http" {
static = 9002
to = 9002
}
}
service {
name = "count-dashboard"
port = "9002"
# This is slightly modified from the stock count-dash examples
# By adding an 'http' health check, the hope was to force nomad to use 'http' over 'tcp'
check {
name = "count-dashboard-health"
type = "http"
protocol = "http"
path = "/health"
port = 9002
interval = "10s"
timeout = "5s"
}
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "count-api"
local_bind_port = 8080
}
}
}
}
}
task "dashboard" {
driver = "docker"
env {
COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
}
config {
image = "hashicorpnomad/counter-dashboard:v1"
}
}
}
}
```
</p>
</details>
2. Create an ingress controller and register it with consul config
```
consul config write ingress-service.hcl
```
```
Listeners = [
{
Port = 8080
Protocol = "http"
Services = [
{
Name = "count-dashboard",
Hosts = ["count.example.com"]
}
]
}
]
```
### Expected result
The service should be added to the ingress controller
### Actual result
Consul throws this warning
```
Error writing config entry ingress-gateway/ingress-service: Unexpected response code: 500 (rpc error making call: service "count-dashboard" has protocol "tcp", which does not match defined listener protocol "http")
```
opened 03:00PM - 27 May 21 UTC
closed 03:09PM - 27 May 21 UTC
When filing a bug, please include the following headings if possible. Any exampl… e text in this template can be deleted.
#### Overview of the Issue
Consul can not re-define a config entry for a service once it's been written.
#### Reproduction Steps
I was using nomad to spin up the service, so:
1. Run a job which includes 1 task and includes a connect {} stanza with a consul sidecar.
2. Wait for the service to appear, write a config entry as follows:
```
Kind = "service-defaults"
Name = "redis"
Protocol = "http"
```
3. Realize that it's actually tcp and you should double check things before running your template jobs
4. Re-write the config entry
```
Kind = "service-defaults"
Name = "redis"
Protocol = "tcp"
```
5. \>
```
$ consul config write /etc/consul.d/files/redis.hcl
Error writing config entry service-defaults/redis: Unexpected response code: 500 (service "redis" has protocol "tcp", which does not match defined listener protocol "http")
```
7. Try to delete the config entry via
```
$consul config delete -kind service-defaults -name redis
Error deleting config entry service-defaults/redis: Unexpected response code: 500 (service "redis" has protocol "tcp", which does not match defined listener protocol "http")
```
### Consul info for both Client and Server
<details>
<summary>Client info</summary>
```
agent:
check_monitors = 0
check_ttls = 1
checks = 16
services = 12
build:
prerelease =
revision = 10bb6cb3
version = 1.9.4
consul:
acl = disabled
known_servers = 3
server = false
runtime:
arch = amd64
cpu_count = 2
goroutines = 163
max_procs = 2
os = linux
version = go1.15.8
serf_lan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 15
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 10943
members = 15
query_queue = 0
query_time = 1
```
</details>
<details>
<summary>Server info</summary>
```
agent:
check_monitors = 0
check_ttls = 0
checks = 3
services = 4
build:
prerelease =
revision = 10bb6cb3
version = 1.9.4
consul:
acl = disabled
bootstrap = false
known_datacenters = 1
leader = true
leader_addr = 10.0.1.7:8300
server = true
raft:
applied_index = 1398004
commit_index = 1398004
fsm_pending = 0
last_contact = 0
last_log_index = 1398004
last_log_term = 15
last_snapshot_index = 1392997
last_snapshot_term = 15
latest_configuration = [{Suffrage:Voter ID:c63cfb18-2fc6-fccc-ee9e-9d41c8a23d08 Address:10.0.0.236:8300} {Suffrage:Voter ID:87bce418-8ac6-402f-935b-58ac84b6422c Address:10.0.1.7:8300} {Suffrage:Voter ID:d46f54e9-f2bc-6915-6e13-bc5c9edfa0fe Address:10.0.2.52:8300}]
latest_configuration_index = 0
num_peers = 2
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 15
runtime:
arch = amd64
cpu_count = 2
goroutines = 373
max_procs = 2
os = linux
version = go1.15.8
serf_lan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 15
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 10943
members = 15
query_queue = 0
query_time = 1
serf_wan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 31
members = 3
query_queue = 0
query_time = 1
```
</details>
### Operating system and Environment details
```
$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
```
### Log Fragments
```
May 27 14:57:22 ip-10-0-1-7 consul: 2021-05-27T14:57:22.938Z [ERROR] agent.http: Request error: method=DELETE url=/v1/config/service-defaults/redis from=127.0.0.1:45328 error="service "redis" has protocol "tcp", which does not match defined listener protocol "http""
```
opened 10:06PM - 04 Oct 22 UTC
type/bug
theme/networking
theme/consul/connect
stage/accepted
### Nomad version
Nomad v1.3.5 (1359c2580fed080295840fb888e28f0855e42d50)
##… # Operating system and Environment details
Ubuntu 22.04 on AWS (on a fresh EC2 instance), amd64
Consul v1.13.2
Revision 0e046bbb
Build Date 2022-09-20T20:30:07Z
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
Docker version 20.10.18, build b40c2f6
### Issue
If I run an ingress container with the `http` protocol, I'm unable to edit it to use `tcp` even after I stop the job. Even if I run `nomad system gc` and `nomad system reconcile summaries`, it still doesn't work. I'm also unable to edit the consul config to use
If I swap all instances of `http` and `tcp` I get the same errors.
### Reproduction steps
1. Start nomad/consul in dev mode:
```
consul agent -dev
sudo nomad agent -dev-connect
```
2. Set up consul to use http as default protocol (using proxy-defaults.hcl file below)
```
consul config write proxy-defaults.hcl
```
3. Run the first job file
```
nomad job run job1.nomad
```
4. After job has started, stop the job
```
nomad job stop job1
```
5. When job stops successfully, run the second job file
```
nomad job run job2.nomad
```
#### Expected Result
I should be able to run job2 as normal.
#### Actual Result
```
$ nomad job run job2.nomad
Error submitting job: Unexpected response code: 500 (Unexpected response code: 500 (service "test-upstream" has protocol "http", which does not match defined listener protocol "tcp"))
$ consul config write service-defaults.hcl
Error writing config entry service-defaults/test-upstream: Unexpected response code: 500 (service "test-upstream" has protocol "tcp", which does not match defined listener protocol "http")
```
### Job file (if appropriate)
proxy-defaults.hcl
```
Kind = "proxy-defaults"
Name = "global"
Config {
protocol = "http"
}
```
service-defaults.hcl
```
Kind = "service-defaults"
Name = "test-upstream"
Protocol = "tcp"
```
job1.nomad:
```
job "job1" {
region = "global"
datacenters = ["dc1"]
type = "system"
group "group1" {
network {
mode = "bridge"
port "default" {
static = 12345
to = 12345
}
}
service {
name = "test-ingress"
port = "12345"
connect {
gateway {
proxy {
connect_timeout = "5s"
}
ingress {
listener {
port = 12345
protocol = "http"
service {
name = "test-upstream"
hosts = ["*"]
}
}
}
}
}
}
}
}
```
job2.nomad:
```
job "job2" {
region = "global"
datacenters = ["dc1"]
type = "system"
group "group2" {
network {
mode = "bridge"
port "default" {
static = 12345
to = 12345
}
}
service {
name = "test-ingress"
port = "12345"
connect {
gateway {
proxy {
connect_timeout = "5s"
}
ingress {
listener {
port = 12345
protocol = "tcp"
service {
name = "test-upstream"
}
}
}
}
}
}
}
}
```
hashicorp:main
← hashicorp:gulducat/consul-ingress-http-no-hosts
opened 08:14PM - 10 Jan 23 UTC
This makes code match the documentation, and reality 😋
Applies to all non-"t… cp" protocols: `http`, `http2`, and `grpc`, which support "hosts" and tests now cover all of them as well. I could maybe be convinced to remove the extra test coverage if it seems superfluous, but it's intended to guard against potential future regressions.
per https://developer.hashicorp.com/nomad/docs/job-specification/gateway#service-parameters,
> `service` Parameters
> * [`hosts`](https://developer.hashicorp.com/nomad/docs/job-specification/gateway#hosts) `(array<string>: nil)` - A list of hosts that specify what requests will match this service. This cannot be used with a `tcp` listener, and cannot be specified alongside a wildcard (`*`) service name. If not specified, the default domain `<service-name>.ingress.*` will be used to match services.
<details><summary>e.g. this will now work:</summary>
```hcl
listener {
port = 8080
protocol = "http"
service {
name = "uuid-api"
# hosts = no longer required
}
}
```
and this will no longer work (by "work" I mean pass to consul, which errors less-specifically):
```hcl
listener {
port = 8080
protocol = "http"
service {
name = "*"
hosts = ["anything"]
}
}
```
error before:
> Error submitting job: Unexpected response code: 500 (Unexpected response code: 500 (Associating hosts to a wildcard service is not supported (listener on port 8080)))
error after:
> Error submitting job: Unexpected response code: 500 (1 error occurred:
* Task group ingress-group validation failed: 1 error occurred:
* Task group service validation failed: 1 error occurred:
* Service[0] my-ingress-service validation failed: 1 error occurred:
* Consul Ingress Service with a wildcard "*" service name can not also specify hosts)
</details>
Closes #10955
Note: to use these non-"tcp" protocols, users will still need to manually write a service-defaults Consul config entry as described in https://github.com/hashicorp/nomad/issues/8647#issuecomment-691279667