Nomad + Consul integration with TLS and verify_ssl=true not working

I’ve been setting up our first Nomad + Consul cluster. I’m now at the point where I have setup everything with TLS and want to make use of auto encrypt.

My setup has been based on these tutorials:

I have a working setup with TLS when using verify_ssl=false in the Nomad client configuration. I can see the nomad-client services are registered in the Consul UI and the client instances are visible in the Nomad UI.

When I change the Nomad client configuration to have verify_ssl=true then the nomad-client services are NOT registered in the Consul UI and the client instances are NOT visible in the Nomad UI.

I also see this in the Nomad client logs:

13:04:57.250Z [WARN]  client.server_mgr: no servers available
13:04:57.251Z [ERROR] client.rpc: error performing RPC to server, deadline exceeded, cannot retry: error="no servers" rpc=Node.Register
13:04:57.253Z [ERROR] client: error discovering nomad servers: error="client.consul: unable to query Consul datacenters: Get \"https://127.0.0.1:8501/v1/catalog/datacenters\": tls: failed to verify certificate: x509: certificate signed by unknown authority"
13:04:59.133Z [WARN]  client.server_mgr: no servers available
13:04:59.183Z [WARN]  client.server_mgr: no servers available

Initially I found this issue:

which pointed me to these links of @schmichael:

where it’s mentioned:

The CA serves two purposes:

  • The key must be used to sign certificates for both Nomad and Consul.
  • The certificate must be used by both Nomad and Consul for verifying one another’s certificates.

I then changed my setup to use a single Cluster Agent CA certificate (for both Consul and Nomad) instead of two separate Agent CA certificates (one for Consul and Nomad each), but I’m still getting the same behaviour and WARN and ERROR logs.

To inspect the certificate, I used from the client node:

openssl s_client -showcerts -connect 127.0.0.1:8501

which returned:

CONNECTED(00000003)
Can't use SSL_get_servername
depth=0 
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 
verify error:num=21:unable to verify the first certificate
verify return:1
depth=0 
verify return:1
---
Certificate chain
 0 s:
   i:CN = pri-fv50uc81.consul.ca.45b2e996.consul
   a:PKEY: id-ecPublicKey, 256 (bit); sigalg: ecdsa-with-SHA256
   v:NotBefore: Sep 15 12:55:19 2023 GMT; NotAfter: Sep 18 12:55:19 2023 GMT
-----BEGIN CERTIFICATE-----
DEADBEEF
-----END CERTIFICATE-----
---
Server certificate
subject=
issuer=CN = pri-fv50uc81.consul.ca.45b2e996.consul
---
Acceptable client certificate CA names
C = XX, ST = Xxxxxxx, L = Xxxxxxx, street = Xxxxxxx, postalCode = 0123, O = Xxxxxxx, OU = Xxxxxxx, CN = Xxxxxxx CA <redacted>
CN = pri-fv50uc81.consul.ca.45b2e996.consul
Requested Signature Algorithms: RSA-PSS+SHA256:ECDSA+SHA256:Ed25519:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA384:ECDSA+SHA512:RSA+SHA1:ECDSA+SHA1
Shared Requested Signature Algorithms: RSA-PSS+SHA256:ECDSA+SHA256:Ed25519:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA384:ECDSA+SHA512
Peer signing digest: SHA256
Peer signature type: ECDSA
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 1286 bytes and written 387 bytes
Verification error: unable to verify the first certificate
---
New, TLSv1.3, Cipher is TLS_AES_128_GCM_SHA256
Server public key is 256 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 21 (unable to verify the first certificate)
---
802BB89D9B7F0000:error:0A000412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate:../ssl/record/rec_layer_s3.c:1584:SSL alert number 42

Exporting it to file and verifying whether it was signed by the Cluster Agent CA:

openssl s_client -connect 127.0.0.1:8501 -showcerts </dev/null 2>/dev/null | openssl x509 -outform PEM >mycertfile.pem
openssl verify -verbose -CAfile cluster-agent-ca.pem mycertfile.pem

returned:

error 20 at 0 depth lookup: unable to get local issuer certificate
error mycertfile.pem: verification failed

I’m no expert when it comes to security and certificates, but if I understand it correctly, then Consul auto encrypt is generating the certificate with CN = pri-fv50uc81.consul.ca.45b2e996.consul, but it’s not signed by the Cluster Agent CA?

  1. Should it even be signed by the Cluster Agent CA?
  2. What else am I missing?

Here are my configs…

consul_server.hcl

bootstrap_expect = 3
datacenter       = "dc1"
data_dir         = "/opt/consul/data"
encrypt          = "redacted"
log_level        = "INFO"
node_name        = "srv01"
retry_join       = ["10.0.2.4", "10.0.2.3", "10.0.2.5"]
server           = true

bind_addr = "10.0.2.4"
client_addr = "0.0.0.0"
advertise_addr = "10.0.2.4"

acl {
  enabled                  = true
  default_policy           = "deny"
  down_policy              = "extend-cache"
  enable_token_persistence = true
}

service {
  name = "consul"
}

connect {
  enabled = true
}

performance {
  raft_multiplier = 1
}

ports {
  http     = -1
  https    = 8501
  grpc     = -1
  grpc_tls = 8503
}

ui_config {
  enabled = true
}

tls {
  defaults {
    ca_file   = "/ops/hashicorp/tls/cluster-agent-ca.pem"
    cert_file = "/ops/hashicorp/tls/dc1-server-consul-srv01.pem"
    key_file  = "/ops/hashicorp/tls/dc1-server-consul-srv01-key.pem"

    verify_incoming = true
    verify_outgoing = true
  }

  internal_rpc {
    verify_server_hostname = true
  }
}

auto_encrypt {
  allow_tls = true
}

nomad_server.hcl

datacenter = "dc1"
data_dir   = "/opt/nomad/data"
region     = "global"

addresses {
        http = "0.0.0.0"
        rpc  = "10.0.2.4"
        serf = "10.0.2.4"
}

advertise {
  http = "10.0.2.4"
  rpc  = "10.0.2.4"
  serf = "10.0.2.4"
}

acl {
  enabled = true
}

consul {
  address      = "127.0.0.1:8501"
  grpc_address = "127.0.0.1:8503"

  allow_unauthenticated = false # There is a corresponding Consul policy, role and token named nomad-job-submitter
  token                 = "redacted"

  ssl          = true
  ca_file      = "/ops/hashicorp/tls/cluster-agent-ca.pem"
  cert_file    = "/ops/hashicorp/tls/dc1-server-consul-srv01.pem"
  key_file     = "/ops/hashicorp/tls/dc1-server-consul-srv01-key.pem"
  grpc_ca_file = "/ops/hashicorp/tls/cluster-agent-ca.pem"
}

server {
  enabled          = true
  bootstrap_expect = 3
  encrypt          = "redacted"
}

tls {
  http      = true
  rpc       = true

  ca_file   = "/ops/hashicorp/tls/cluster-agent-ca.pem"
  cert_file = "/ops/hashicorp/tls/global-server-nomad.pem"
  key_file  = "/ops/hashicorp/tls/global-server-nomad-key.pem"

  verify_server_hostname = true
  verify_https_client    = true
}

consul_client.hcl

datacenter = "dc1"
data_dir   = "/opt/consul/data"
encrypt    = "redacted"
log_level  = "INFO"
node_name  = "client01"
retry_join = ["10.0.2.4", "10.0.2.3", "10.0.2.5"]
server     = false

bind_addr = "10.0.2.7"
client_addr = "0.0.0.0"
advertise_addr = "10.0.2.7"

acl {
  enabled                  = true
  default_policy           = "deny"
  down_policy              = "extend-cache"
  enable_token_persistence = false

  tokens {
    agent = "redacted"
  }
}

connect {
  enabled = true
}

ports {
  http     = -1
  https    = 8501
  grpc     = -1
  grpc_tls = 8503
}

ui_config {
  enabled = true
}

tls {
  defaults {
    ca_file = "/ops/hashicorp/tls/cluster-agent-ca.pem"

    verify_incoming = true
    verify_outgoing = true
  }

  internal_rpc {
    verify_server_hostname = true
  }
}

auto_encrypt {
  tls = true
}

nomad_client.hcl

datacenter = "dc1"
data_dir   = "/opt/nomad/data"
bind_addr  = "10.0.2.7"
region     = "global"

advertise {
  http = "10.0.2.7"
  rpc  = "10.0.2.7"
  serf = "10.0.2.7"
}

acl {
  enabled = true
}

consul {
  address      = "127.0.0.1:8501"
  grpc_address = "127.0.0.1:8503"

  allow_unauthenticated = false # There is a corresponding Consul policy, role and token named nomad-job-submitter
  token                 = "redacted"

  ssl          = true
  ca_file      = "/ops/hashicorp/tls/cluster-agent-ca.pem"
  cert_file    = "/ops/hashicorp/tls/global-client-nomad.pem"
  key_file     = "/ops/hashicorp/tls/global-client-nomad-key.pem"
  grpc_ca_file = "/ops/hashicorp/tls/cluster-agent-ca.pem"
  verify_ssl   = true # Setting this to false makes it work
}

client {
  enabled = true

  host_volume "blahblah" {
    path      = "/path/to/host/volume"
    read_only = false
  }

  node_pool = "client01"
}

plugin "docker" {
  config {
    allow_privileged = true

    volumes {
      enabled = true
    }
  }
}

plugin "raw_exec" {
  config {
    enabled = true
  }
}

tls {
  http      = true
  rpc       = true

  ca_file   = "/ops/hashicorp/tls/cluster-agent-ca.pem"
  cert_file = "/ops/hashicorp/tls/global-client-nomad.pem"
  key_file  = "/ops/hashicorp/tls/global-client-nomad-key.pem"

  verify_server_hostname = true
  verify_https_client    = true
}

If you’re wondering why I’m using manual IP addresses for the retry_join config, then it’s because I’m using a cloud provider that’s not yet supported by go-discover.

Hi @heatzync,

When you enable auto_encrypt, the Consul Client agent certificates are signed by the ConnectCA (built-in CA in Consul).

So, when your Nomad agents communicate to Consul Agents using Auto-Encrypt certificates, you should be using the ConnectCA CA Certificate in the ca_file option in the consul {} block of Nomad configuration.

You can fetch the ConnectCA CA Cert by hitting the following endpoint.

$ curl 127.0.0.1:8500/v1/connect/ca/roots | jq -r '.Roots[].RootCert' > ca.crt

ref: Certificate Authority - Connect - HTTP API | Consul | HashiCorp Developer

I hope this helps.

1 Like

Thank you @Ranjandas! Your suggestion pointed me in the right direction.

Due to the Consul cluster already running TLS, I had to tweak the curl command a bit to:

  • use https
  • use port 8501
  • specify the -k option to allow insecure mode in order to skip the verification step
  • supply the client cert and key (otherwise I got bad certificate)
curl -k https://127.0.0.1:8501/v1/connect/ca/roots --cert /ops/hashicorp/tls/global-client-nomad.pem --key /ops/hashicorp/tls/global-client-nomad-key.pem

I also noticed in the Connect HTTP API that there’s a pem query parameter that can be specified which causes the CA certificate to be returned in PEM encoded format. This was the final command:

curl -k https://127.0.0.1:8501/v1/connect/ca/roots?pem=true --cert /ops/hashicorp/tls/global-client-nomad.pem --key /ops/hashicorp/tls/global-client-nomad-key.pem >/ops/hashicorp/tls/connect-ca.pem

and then I updated the Nomad client configuration as you also suggested:

consul {
...
  ssl          = true
  ca_file      = "/ops/hashicorp/tls/connect-ca.pem"
...
  verify_ssl   = true
}
1 Like