I’ve been setting up our first Nomad + Consul cluster. I’m now at the point where I have setup everything with TLS and want to make use of auto encrypt.
My setup has been based on these tutorials:
I have a working setup with TLS when using verify_ssl=false
in the Nomad client configuration. I can see the nomad-client
services are registered in the Consul UI and the client instances are visible in the Nomad UI.
When I change the Nomad client configuration to have verify_ssl=true
then the nomad-client
services are NOT registered in the Consul UI and the client instances are NOT visible in the Nomad UI.
I also see this in the Nomad client logs:
13:04:57.250Z [WARN] client.server_mgr: no servers available
13:04:57.251Z [ERROR] client.rpc: error performing RPC to server, deadline exceeded, cannot retry: error="no servers" rpc=Node.Register
13:04:57.253Z [ERROR] client: error discovering nomad servers: error="client.consul: unable to query Consul datacenters: Get \"https://127.0.0.1:8501/v1/catalog/datacenters\": tls: failed to verify certificate: x509: certificate signed by unknown authority"
13:04:59.133Z [WARN] client.server_mgr: no servers available
13:04:59.183Z [WARN] client.server_mgr: no servers available
Initially I found this issue:
which pointed me to these links of @schmichael:
where it’s mentioned:
The CA serves two purposes:
- The key must be used to sign certificates for both Nomad and Consul.
- The certificate must be used by both Nomad and Consul for verifying one another’s certificates.
I then changed my setup to use a single Cluster Agent CA certificate (for both Consul and Nomad) instead of two separate Agent CA certificates (one for Consul and Nomad each), but I’m still getting the same behaviour and WARN
and ERROR
logs.
To inspect the certificate, I used from the client node:
openssl s_client -showcerts -connect 127.0.0.1:8501
which returned:
CONNECTED(00000003)
Can't use SSL_get_servername
depth=0
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0
verify error:num=21:unable to verify the first certificate
verify return:1
depth=0
verify return:1
---
Certificate chain
0 s:
i:CN = pri-fv50uc81.consul.ca.45b2e996.consul
a:PKEY: id-ecPublicKey, 256 (bit); sigalg: ecdsa-with-SHA256
v:NotBefore: Sep 15 12:55:19 2023 GMT; NotAfter: Sep 18 12:55:19 2023 GMT
-----BEGIN CERTIFICATE-----
DEADBEEF
-----END CERTIFICATE-----
---
Server certificate
subject=
issuer=CN = pri-fv50uc81.consul.ca.45b2e996.consul
---
Acceptable client certificate CA names
C = XX, ST = Xxxxxxx, L = Xxxxxxx, street = Xxxxxxx, postalCode = 0123, O = Xxxxxxx, OU = Xxxxxxx, CN = Xxxxxxx CA <redacted>
CN = pri-fv50uc81.consul.ca.45b2e996.consul
Requested Signature Algorithms: RSA-PSS+SHA256:ECDSA+SHA256:Ed25519:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA384:ECDSA+SHA512:RSA+SHA1:ECDSA+SHA1
Shared Requested Signature Algorithms: RSA-PSS+SHA256:ECDSA+SHA256:Ed25519:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA384:ECDSA+SHA512
Peer signing digest: SHA256
Peer signature type: ECDSA
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 1286 bytes and written 387 bytes
Verification error: unable to verify the first certificate
---
New, TLSv1.3, Cipher is TLS_AES_128_GCM_SHA256
Server public key is 256 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 21 (unable to verify the first certificate)
---
802BB89D9B7F0000:error:0A000412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate:../ssl/record/rec_layer_s3.c:1584:SSL alert number 42
Exporting it to file and verifying whether it was signed by the Cluster Agent CA:
openssl s_client -connect 127.0.0.1:8501 -showcerts </dev/null 2>/dev/null | openssl x509 -outform PEM >mycertfile.pem
openssl verify -verbose -CAfile cluster-agent-ca.pem mycertfile.pem
returned:
error 20 at 0 depth lookup: unable to get local issuer certificate
error mycertfile.pem: verification failed
I’m no expert when it comes to security and certificates, but if I understand it correctly, then Consul auto encrypt is generating the certificate with CN = pri-fv50uc81.consul.ca.45b2e996.consul
, but it’s not signed by the Cluster Agent CA?
- Should it even be signed by the Cluster Agent CA?
- What else am I missing?
Here are my configs…
consul_server.hcl
bootstrap_expect = 3
datacenter = "dc1"
data_dir = "/opt/consul/data"
encrypt = "redacted"
log_level = "INFO"
node_name = "srv01"
retry_join = ["10.0.2.4", "10.0.2.3", "10.0.2.5"]
server = true
bind_addr = "10.0.2.4"
client_addr = "0.0.0.0"
advertise_addr = "10.0.2.4"
acl {
enabled = true
default_policy = "deny"
down_policy = "extend-cache"
enable_token_persistence = true
}
service {
name = "consul"
}
connect {
enabled = true
}
performance {
raft_multiplier = 1
}
ports {
http = -1
https = 8501
grpc = -1
grpc_tls = 8503
}
ui_config {
enabled = true
}
tls {
defaults {
ca_file = "/ops/hashicorp/tls/cluster-agent-ca.pem"
cert_file = "/ops/hashicorp/tls/dc1-server-consul-srv01.pem"
key_file = "/ops/hashicorp/tls/dc1-server-consul-srv01-key.pem"
verify_incoming = true
verify_outgoing = true
}
internal_rpc {
verify_server_hostname = true
}
}
auto_encrypt {
allow_tls = true
}
nomad_server.hcl
datacenter = "dc1"
data_dir = "/opt/nomad/data"
region = "global"
addresses {
http = "0.0.0.0"
rpc = "10.0.2.4"
serf = "10.0.2.4"
}
advertise {
http = "10.0.2.4"
rpc = "10.0.2.4"
serf = "10.0.2.4"
}
acl {
enabled = true
}
consul {
address = "127.0.0.1:8501"
grpc_address = "127.0.0.1:8503"
allow_unauthenticated = false # There is a corresponding Consul policy, role and token named nomad-job-submitter
token = "redacted"
ssl = true
ca_file = "/ops/hashicorp/tls/cluster-agent-ca.pem"
cert_file = "/ops/hashicorp/tls/dc1-server-consul-srv01.pem"
key_file = "/ops/hashicorp/tls/dc1-server-consul-srv01-key.pem"
grpc_ca_file = "/ops/hashicorp/tls/cluster-agent-ca.pem"
}
server {
enabled = true
bootstrap_expect = 3
encrypt = "redacted"
}
tls {
http = true
rpc = true
ca_file = "/ops/hashicorp/tls/cluster-agent-ca.pem"
cert_file = "/ops/hashicorp/tls/global-server-nomad.pem"
key_file = "/ops/hashicorp/tls/global-server-nomad-key.pem"
verify_server_hostname = true
verify_https_client = true
}
consul_client.hcl
datacenter = "dc1"
data_dir = "/opt/consul/data"
encrypt = "redacted"
log_level = "INFO"
node_name = "client01"
retry_join = ["10.0.2.4", "10.0.2.3", "10.0.2.5"]
server = false
bind_addr = "10.0.2.7"
client_addr = "0.0.0.0"
advertise_addr = "10.0.2.7"
acl {
enabled = true
default_policy = "deny"
down_policy = "extend-cache"
enable_token_persistence = false
tokens {
agent = "redacted"
}
}
connect {
enabled = true
}
ports {
http = -1
https = 8501
grpc = -1
grpc_tls = 8503
}
ui_config {
enabled = true
}
tls {
defaults {
ca_file = "/ops/hashicorp/tls/cluster-agent-ca.pem"
verify_incoming = true
verify_outgoing = true
}
internal_rpc {
verify_server_hostname = true
}
}
auto_encrypt {
tls = true
}
nomad_client.hcl
datacenter = "dc1"
data_dir = "/opt/nomad/data"
bind_addr = "10.0.2.7"
region = "global"
advertise {
http = "10.0.2.7"
rpc = "10.0.2.7"
serf = "10.0.2.7"
}
acl {
enabled = true
}
consul {
address = "127.0.0.1:8501"
grpc_address = "127.0.0.1:8503"
allow_unauthenticated = false # There is a corresponding Consul policy, role and token named nomad-job-submitter
token = "redacted"
ssl = true
ca_file = "/ops/hashicorp/tls/cluster-agent-ca.pem"
cert_file = "/ops/hashicorp/tls/global-client-nomad.pem"
key_file = "/ops/hashicorp/tls/global-client-nomad-key.pem"
grpc_ca_file = "/ops/hashicorp/tls/cluster-agent-ca.pem"
verify_ssl = true # Setting this to false makes it work
}
client {
enabled = true
host_volume "blahblah" {
path = "/path/to/host/volume"
read_only = false
}
node_pool = "client01"
}
plugin "docker" {
config {
allow_privileged = true
volumes {
enabled = true
}
}
}
plugin "raw_exec" {
config {
enabled = true
}
}
tls {
http = true
rpc = true
ca_file = "/ops/hashicorp/tls/cluster-agent-ca.pem"
cert_file = "/ops/hashicorp/tls/global-client-nomad.pem"
key_file = "/ops/hashicorp/tls/global-client-nomad-key.pem"
verify_server_hostname = true
verify_https_client = true
}
If you’re wondering why I’m using manual IP addresses for the retry_join
config, then it’s because I’m using a cloud provider that’s not yet supported by go-discover.