Http-echo job works when using http port but fails with https port

hi
i created echo job using “mendhak/http-https-echo”
using http port 8080 the service works fine.
but when i change it to use https port 8443 it stops working.
it shows:

upstream connect error or disconnect/reset before headers. reset reason: connection termination
job:

job "echo" {
  datacenters = ["dc1"]

  group "echo" {
    network {
      mode = "bridge"
    }

    service {
      name = "echo"
      port = "8443"
      # port = "8080" # works

      connect {
        sidecar_service {}
      }
    }

    task "echo" {
      driver = "docker"

      config {
        image = "mendhak/http-https-echo"
      }
    }
  }
}

nomad agent info:

client
  heartbeat_ttl = 18.351539247s
  known_servers = 10.10.0.10:4647
  last_heartbeat = 15.351549697s
  node_id = 87407e84-1b0d-a573-1598-7591908c3e71
  num_allocations = 13
nomad
  bootstrap = true
  known_regions = 1
  leader = true
  leader_addr = 10.10.0.10:4647
  server = true
raft
  applied_index = 7563
  commit_index = 7563
  fsm_pending = 0
  last_contact = 0
  last_log_index = 7563
  last_log_term = 29
  last_snapshot_index = 0
  last_snapshot_term = 0
  latest_configuration = [{Suffrage:Voter ID:cac2ea92-2e5b-7dbe-3cb1-bec010a067d2 Address:10.10.0.10:4647}]
  latest_configuration_index = 0
  num_peers = 0
  protocol_version = 3
  protocol_version_max = 3
  protocol_version_min = 0
  snapshot_version_max = 1
  snapshot_version_min = 0
  state = Leader
  term = 29
runtime
  arch = amd64
  cpu_count = 16
  goroutines = 663
  kernel.name = linux
  max_procs = 16
  version = go1.20.7
serf
  coordinate_resets = 0
  encrypted = true
  event_queue = 0
  event_time = 1
  failed = 0
  health_score = 0
  intent_queue = 0
  left = 0
  member_time = 1
  members = 1
  query_queue = 0
  query_time = 1
vault
  token_expire_time = 
  token_last_renewal_time = 
  token_next_renewal_time = 
  token_ttl = 0s
  tracked_for_revoked = 0client
  heartbeat_ttl = 18.351539247s
  known_servers = 10.10.0.10:4647
  last_heartbeat = 15.351549697s
  node_id = 87407e84-1b0d-a573-1598-7591908c3e71
  num_allocations = 13
nomad
  bootstrap = true
  known_regions = 1
  leader = true
  leader_addr = 10.10.0.10:4647
  server = true
raft
  applied_index = 7563
  commit_index = 7563
  fsm_pending = 0
  last_contact = 0
  last_log_index = 7563
  last_log_term = 29
  last_snapshot_index = 0
  last_snapshot_term = 0
  latest_configuration = [{Suffrage:Voter ID:cac2ea92-2e5b-7dbe-3cb1-bec010a067d2 Address:10.10.0.10:4647}]
  latest_configuration_index = 0
  num_peers = 0
  protocol_version = 3
  protocol_version_max = 3
  protocol_version_min = 0
  snapshot_version_max = 1
  snapshot_version_min = 0
  state = Leader
  term = 29
runtime
  arch = amd64
  cpu_count = 16
  goroutines = 663
  kernel.name = linux
  max_procs = 16
  version = go1.20.7
serf
  coordinate_resets = 0
  encrypted = true
  event_queue = 0
  event_time = 1
  failed = 0
  health_score = 0
  intent_queue = 0
  left = 0
  member_time = 1
  members = 1
  query_queue = 0
  query_time = 1
vault
  token_expire_time = 
  token_last_renewal_time = 
  token_next_renewal_time = 
  token_ttl = 0s
  tracked_for_revoked = 0

consul info:

agent:
	check_monitors = 0
	check_ttls = 1
	checks = 15
	services = 16
build:
	prerelease = 
	revision = 
	version = 1.16.1
	version_metadata = 
consul:
	acl = disabled
	bootstrap = true
	known_datacenters = 1
	leader = true
	leader_addr = 10.10.0.10:8300
	server = true
raft:
	applied_index = 14897
	commit_index = 14897
	fsm_pending = 0
	last_contact = 0
	last_log_index = 14897
	last_log_term = 21
	last_snapshot_index = 0
	last_snapshot_term = 0
	latest_configuration = [{Suffrage:Voter ID:eb300335-b6c8-b1a5-13b4-df5d92237c17 Address:10.10.0.10:8300}]
	latest_configuration_index = 0
	num_peers = 0
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Leader
	term = 21
runtime:
	arch = amd64
	cpu_count = 16
	goroutines = 335
	max_procs = 16
	os = linux
	version = go1.20.7
serf_lan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 1
	event_time = 21
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1

Any help is greatly appreciated as I have been trying to solve this issue for a few days now.

Hi @lounanealerts,

I believe you will need to tell the application running inside the Docker container that you are not exposing it via the default port. The mendhak/docker-http-https-echo GitHub page has more information on this; the Nomad job specification includes an env block which can be used to perform the desired configuration change.

The modified job spec would look something like:

job "echo" {
  datacenters = ["dc1"]

  group "echo" {
    network {
      mode = "bridge"
    }

    service {
      name = "echo"
      port = "8443"

      connect {
        sidecar_service {}
      }
    }

    task "echo" {
      driver = "docker"

      env {
        HTTP_PORT = "8443"
      }

      config {
        image = "mendhak/http-https-echo"
      }
    }
  }
}

Thanks,
jrasell and the Nomad team

Hi @jrasell

mendhak/http-https-echo image exposes both 8080 “http” and 8443 “https” by default.
it even says in service stdout :

Listening on ports 8080 for http, and 8443 for https.

the same problem happened when i tried another image “kasm” which only exposes https port.
there must be something wrong in my config. cause this is a basic functionality.
do i need to enable tls for nomad and consul? maybe

thanks for the fast response

I still have the same problem.
Should i open an Issue in github ?

Hi @lounanealerts,

do i need to enable tls for nomad and consul?

No this should not be required.

Should i open an Issue in github ?

No as this seems an issue with the job spec that you have written rather than a bug with Nomad.

I have run a quick test locally modifying the job spec you wrote to demonstrate the example job working with Nomad only. It shows some differences to the one you used, whereby we define a network port and instruct Docker to expose this. When I navigate in my browser to https://localhost:8443 I receive a response from the webserver as expected. Hopefully this can help you modify your jobspec and resolve your problem.

The Nomad network block documentation has some useful reading regarding networking. The CNI page can also be used to understand more about Nomad’s use of CNI, which is used when running in bridge mode.

job "echo" {
  datacenters = ["dc1"]

  group "echo" {
    network {
      mode = "bridge"
      port "https" {
        static = 8443
      }
    }

    service {
      provider = "nomad"
      name     = "echo"
      port     = "8443"
    }

    task "echo" {
      driver = "docker"

      config {
        image = "mendhak/http-https-echo"
        ports = ["https"]
      }
    }
  }
}

Thanks,
jrasell and the Nomad team

Hi @jrasell,

indeed, you’re right accessing it using static port “8443” does work.
but the problem is when i try to access it using service sidecar port, it gives me:

This site can’t provide a secure connection
<server_ip_address> didn’t accept your login certificate, or one may not have been provided

  • Try contacting the system admin.
    ERR_BAD_SSL_CLIENT_AUTH_CERT

it looks like the service sidecar is trying to verify client TLS certificate.
is there a way to configure service sidecar to skip TLS certificate verification ??

Thank you so much.

Hi @lounanealerts,

Consul by default supports HTTPS services inside the mesh only when the service protocol type is tcp.

The error you are getting is due to the service protocol being set to http (either by proxy-defaults or service-defaults config entry).

While I don’t know your API Gateway Setup, I think one option to try out is to set the protocol type of echo service to tcp with a service-defaults config entry as shown below:

# echo-service-defaults.hcl
# apply this using 
# $ consul config write echo-service-defaults.hcl
kind="service-defaults"
name="echo"
protocol="tcp"

I have an example using the Ingress Gateway example configuration from the Nomad documentation.

https://asciinema.org/a/5frWzCm7eqjEzUlYxI9Lq9kYc

If you want the service protocol to be https, you will have to use the Consul Escape-Hatch override to replace the envoy_local_cluster_json. There is an example in the attached job spec. Could you please try it and let us know if it works for you?

# file: echo-envoy-local_cluster.hcl
# to use when the echo service type is of type `http`

job "echo" {

  group "ingress-group" {

    network {
      mode = "bridge"

      # This example will enable plain HTTP traffic to access the uuid-api connect
      # native example service on port 8080.
      port "inbound" {
        static = 8080
        to     = 8080
      }
    }

    service {
      name = "my-ingress-service"
      port = "8080"

      connect {
        gateway {

          # Consul gateway [envoy] proxy options.
          proxy {
            # The following options are automatically set by Nomad if not
            # explicitly configured when using bridge networking.
            #
            # envoy_gateway_no_default_bind = true
            # envoy_gateway_bind_addresses "uuid-api" {
            #   address = "0.0.0.0"
            #   port    = <associated listener.port>
            # }
            #
            # Additional options are documented at
            # https://www.nomadproject.io/docs/job-specification/gateway#proxy-parameters
          }

          # Consul Ingress Gateway Configuration Entry.
          ingress {
            # Nomad will automatically manage the Configuration Entry in Consul
            # given the parameters in the ingress block.
            #
            # Additional options are documented at
            # https://www.nomadproject.io/docs/job-specification/gateway#ingress-parameters
            listener {
              port     = 8080
              protocol = "tcp"
              service {
                name = "echo"
              }
            }
          }
        }
      }
    }
  }



  group "echo" {
    network {
      mode = "bridge"
    }

    service {
      name = "echo"
      port = "8443"

      connect {
        sidecar_service {
          proxy {
            config {

              envoy_local_cluster_json = <<EOL
			          { 
                  "@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster",
                  "name": "local_app", 
                  "type": "STATIC",
                  "connect_timeout": "5s",
                  "load_assignment": {
                    "cluster_name": "local_app",
                    "endpoints": [ 
					            { 
                        "lb_endpoints": [ 
                          { 
                            "endpoint": { 
                              "address": { 
                                "socket_address": { 
                                  "address": "127.0.0.1", 
                                  "port_value": 8443 
                                } 
                              } 
                            } 
                          } 
                        ] 
                      } 
                    ] 
                  }, 
                  "transport_socket": {
                    "name": "tls", 
                    "typed_config": { 
                      "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext",
                      "common_tls_context": {} 
                    }
                  }
                }
	 EOL
            }
          }
        }
      }
    }

    task "echo" {
      driver = "docker"

      config {
        image = "mendhak/http-https-echo"
      }
    }
  }
}

Hi @Ranjandas,

please could you elaborate more.
do you mean that

  1. to access service using https. the service-defaults protocol needs to be tcp
  2. i need to configure an ingress gateway service along side it. but why? is this needed when using api gateway ?
  3. should the api-gateway be listening on tcp port ?
  4. what does the escape hatch in the job above actually do?

I would love an overview of how things work. is there some resources you recommend me to read ?

Thanks a lot

Hi @lounanealerts,

Here are the answers to your questions:

to access service using https. the service-defaults protocol needs to be TCP

What I meant is, in Consul, the services by default are supposed to be running without TLS, as it is the sidecars that take care of TLS (mTLS) and expose outside the host boundary. So in your case, since the service is of type HTTPS the following are the options to get the sidecar to talk to it.

  • Set the Service Protocol Type to tcp (default, unless overridden by proxy-defaults): In this scenario, the
    public_listener of the sidecar is of type TcpProxy , and will proxy to the local_app cluster.

  • If the Service Protocol Type is http, replace the local_app cluster: In this scenario, the Listener will be of type HttpConnectionManager and will try to talk to the local_app cluster and the local_app cluster will need to be configured to have HTTPs related information. This is what we are doing by replacing the envoy_local_cluster_json.

    To understand this in detail, you should have a good understanding of Envoy Proxy and its various filters. In short (over simplifying), in the first scenario, the request flows from the downstream sidecar to the upstream app in the form of a tunnel.

i need to configure an ingress gateway service alongside it. but why? is this needed when using api gateway?

Sorry for the confusion. You don’t have to use an Ingress Gateway. I used it as I didn’t know anything about your API Gateway setup, and I just picked up an example from Nomad docs to show the error and the solution.

should the api-gateway be listening on tcp port ?

The short answer is no. But unfortunately, I can’t give more information on this considering I don’t know your setup.

what does the escape hatch in the job above actually do?

The escape hatch overrides the Envoy local_app cluster to add https-related config into it. Without that override, the local app cluster would look like this (you will see that the endpoint is treated as a non-TLS network service):

   "dynamic_active_clusters": [
    {
     "version_info": "db4157a7610616833d88d23d11e926fcdc91f523aa2fe39c1c21e69fd2298927",
     "cluster": {
      "@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster",
      "name": "local_app",
      "type": "STATIC",
      "connect_timeout": "5s",
      "load_assignment": {
       "cluster_name": "local_app",
       "endpoints": [
        {
         "lb_endpoints": [
          {
           "endpoint": {
            "address": {
             "socket_address": {
              "address": "127.0.0.1",
              "port_value": 8443
             }
            }
           }
          }
         ]
        }
       ]
      }
     },
     "last_updated": "2023-09-19T11:32:27.148Z"
    }
   ]

If you are familiar with Envoy, or you want to explore more, I would recommend you get into the sidecar allocation and run curl 127.0.0.2:19001/config_dump and you can inspect the envoy config and attempt to understand it.

I have tried to keep it simple. I hope this helps.

ref: