TLS errors in k8s

We are running version 1.13.2 of the vault in kubernetes (EKS) and periodically we get failures getting AWS credentials using an Ansible playbook. The ansible error indicates a 504 when making a request to the vault through a load balancer.

In the vault log, we periodically see errors that look like this:

2024-05-15T17:04:22.196Z [INFO]  http: TLS handshake error from 10.0.2.163:24948: local error: tls: bad record MAC
2024-05-15T17:04:22.231Z [INFO]  http: TLS handshake error from 10.0.3.72:30634: local error: tls: bad record MAC
2024-05-15T17:04:40.541Z [INFO]  http: TLS handshake error from 10.0.2.163:6224: local error: tls: bad record MAC
2024-05-15T17:04:42.490Z [INFO]  http: TLS handshake error from 10.0.3.72:30692: local error: tls: bad record MAC
2024-05-15T17:05:07.845Z [INFO]  http: TLS handshake error from 10.0.3.72:36104: local error: tls: bad record MAC
2024-05-15T17:05:21.850Z [INFO]  http: TLS handshake error from 10.0.3.72:40434: local error: tls: bad record MAC
2024-05-15T17:05:27.405Z [INFO]  http: TLS handshake error from 10.0.2.163:24910: local error: tls: bad record MAC
2024-05-15T17:05:40.090Z [INFO]  http: TLS handshake error from 10.0.3.72:6970: local error: tls: bad record MAC
2024-05-15T17:05:41.401Z [INFO]  http: TLS handshake error from 10.0.2.163:5040: local error: tls: bad record MAC
2024-05-15T17:06:11.659Z [INFO]  http: TLS handshake error from 10.0.3.72:32928: write tcp4 10.0.4.235:8200->10.0.3.72:32928: i/o timeout
2024-05-15T17:06:33.815Z [INFO]  http: TLS handshake error from 10.0.2.163:37822: write tcp4 10.0.4.235:8200->10.0.2.163:37822: i/o timeout
2024-05-15T17:06:38.454Z [INFO]  http: TLS handshake error from 10.0.3.72:51210: write tcp4 10.0.4.235:8200->10.0.3.72:51210: i/o timeout
2024-05-15T17:06:41.241Z [INFO]  http: TLS handshake error from 10.0.3.72:14112: write tcp4 10.0.4.235:8200->10.0.3.72:14112: i/o timeout
2024-05-15T17:06:44.731Z [INFO]  http: TLS handshake error from 10.0.2.163:37246: write tcp4 10.0.4.235:8200->10.0.2.163:37246: i/o timeout
2024-05-15T17:06:45.101Z [INFO]  http: TLS handshake error from 10.0.3.72:14106: write tcp4 10.0.4.235:8200->10.0.3.72:14106: i/o timeout
2024-05-15T17:06:54.879Z [INFO]  http: TLS handshake error from 10.0.2.163:11580: write tcp4 10.0.4.235:8200->10.0.2.163:11580: i/o timeout

Is there something that we should be looking at to solve this problem?
Our vault is backed by S3 btw.

When Ansible is failing to connect to Vault, is it a one off scenario e.g.

  • attempt 1: works
  • attempt 2: works
  • attempt 3: fails
  • attempt 4: works

Or does the error happen consistently over a period of time?

It seems to happen randomly. We’ve updated everything we can find in Ansible and added a retry to the requests and I think that the problem has stopped happening.

Glad I fixed it :slight_smile:

If it seemed random, I wonder if there was some timeout in the request to AWS to get the credentials.