TLS handshake error

Using the below config on 3 Vault Raft cluster servers:

  storage "raft" {
    path    = "C:\\raft"
    node_id = "vault_1"
  }
  listener "tcp" {
    address = "0.0.0.0:8200"
    tls_cert_file="C:\\Vault\\cert.pem"
    tls_key_file="C:\\Vault\\key.pem"
  }
  api_addr = "https://172.24.32.184:8200"
  disable_mlock = true
  cluster_addr = "https://172.24.32.184:8201"
  ui = true

I am able to connect successfully via SSL (curl), but after the server starts up, I get a stream of logs (every second or so) that look like this:

2020-10-17T14:44:00.826-0500 [INFO]  http: TLS handshake error from 172.24.32.3:26080: EOF
2020-10-17T14:44:04.563-0500 [INFO]  http: TLS handshake error from 172.24.32.2:30899: EOF
2020-10-17T14:44:05.822-0500 [INFO]  http: TLS handshake error from 172.24.32.3:26091: EOF
2020-10-17T14:44:09.519-0500 [INFO]  http: TLS handshake error from 172.24.32.2:30919: EOF
2020-10-17T14:44:10.783-0500 [INFO]  http: TLS handshake error from 172.24.32.3:26101: EOF
2020-10-17T14:44:14.494-0500 [INFO]  http: TLS handshake error from 172.24.32.2:30950: EOF
2020-10-17T14:44:15.858-0500 [INFO]  http: TLS handshake error from 172.24.32.3:26110: EOF
2020-10-17T14:44:19.567-0500 [INFO]  http: TLS handshake error from 172.24.32.2:30980: EOF
2020-10-17T14:44:20.815-0500 [INFO]  http: TLS handshake error from 172.24.32.3:26122: EOF
2020-10-17T14:44:24.529-0500 [INFO]  http: TLS handshake error from 172.24.32.2:31024: EOF
2020-10-17T14:44:25.890-0500 [INFO]  http: TLS handshake error from 172.24.32.3:26134: EOF
2020-10-17T14:44:29.500-0500 [INFO]  http: TLS handshake error from 172.24.32.2:31046: EOF
2020-10-17T14:44:30.842-0500 [INFO]  http: TLS handshake error from 172.24.32.3:26144: EOF
2020-10-17T14:44:34.562-0500 [INFO]  http: TLS handshake error from 172.24.32.2:31111: EOF
2020-10-17T14:44:35.830-0500 [INFO]  http: TLS handshake error from 172.24.32.3:26155: EOF

Can anyone help me with what these errors mean, and how I can correct my configuration to eliminate them?

I would suggest setting the address in your listener configuration to the hostname (or ip) the certificate is issued for. Listening on all interfaces could lead to an tls error.

Thanks for the suggestion. When I set listener to the hostname, vault cannot start up and it reports:

c:\Vault>vault.exe server -config vault.hcl -log-level=trace
Error initializing listener of type tcp: listen tcp xxx.xxx.xxx.xxx:8200: bind: The requested address is not valid in its context.

If it matters, I provided the host name as a domain name which is associated to the SSL certificate I am providing to vault. The endpoint is also a load balanced endpoint.

Do I need to set another/different parameter in the listener "tcp" stanza to make this work?

Thanks!

I think it’s a different issue actually.

Your config seems correct, but I get the impression something is trying to connect to Vault using a non-TLS connection (regular http, or something totally different even).

Hello,

What device does this IP 172.24.32.3 belong to?

That’s a good question and one that eventually led me to the solution. That IP address is of the load balancer. The environment I’m working in is a managed environment by another team (so I didn’t have visibility into it). The load balancer was configured to check the /v1/sys/health endpoint via HTTP, so it was leading to the TLS Handshake error.

Changing the load balancer to use HTTPS corrected the issue in the log.

2 Likes

Hi all

My 5cents, I see this pattern when it comes to TLS and has vault behind a load balancer (ALB/NLB for AWS), or K8S LB. This may be similar behaviour for Azure and GCP

  • If you configure vault on TLS (desired for security) and you have an LB at the front, vault expect your traffic and LB health check to be on TLS protocol. The message of TLS handshake error makes sense when your LB tries to do the health check with no TLS configuration.
  • Vault has a good API endpoint for heath check (/sys/v1/health) it will label to listen on HTTP, for a single node is fine; but if your vault server enters on standby (because of a cluster configuration and HA), then “/sys/v1/health” respond will go dormant and then your LB health check error fails

I managed to fix this by configuring my vault cluster using two listeners like below

# TCP Listener main traffic
listener "tcp" {
  address       = "0.0.0.0:8200"
  tls_cert_file = "/MYPATH/server.crt"
  tls_key_file  = "/MYPATH/server.key"
  tls_disable   = "false"
}

# TCP Listener for NLB healthcheck
listener "tcp" {
  address     = "0.0.0.0:8202"
  tls_disable = "true"
}

As you see above, the first listener is TLS and expect to data traffic from the application via LB to answer on port 8200; the 2nd listener is NonTLS and will use for my LB health check to ping the vault node using port 8202

Of course, make sure the ports 8202/8203 only are expose on the LB only

Feel free to use your own ports as you wish

Hope that help

Hi
Could you please so kind and provide a configuration of NLB for mentioned in topic Vault configuration. I’m facing the same issue and I’m little confusing about how to configure NLB.