Hi,
We are using vault 1.18.0 on rehl8
our audit configuration has 2 endpoint:
- local file with plenty of place on FS
- one TCP endpoint to our SIEM
root@xxxxxxx ~]# vault audit list -detailed
Path Type Description Replication Options
---- ---- ----------- ----------- -------
file/ file n/a replicated file_path=/var/log/vault/audit.log mode=0644 elide_list_responses=true
siem/ socket n/a replicated description=This audit device will send logs to SIEM. elide_list_responses=true socket_type=tcp address=X.X.X.X:6562
Recently something happened to our SIEM that create timeouts with a burst(8K messages in 20 mins) of message:
{\"@level\":\"error\",\"@message\":\"socket sink error\",\"@module\":\"audit\",\"@timestamp\":\"2025-02-11T17:14:15.747020Z\",\"context\":\"context deadline exceeded\",\"error\":\"error writing to socket \\\"X.X.X.X:6562\\\": 2 errors occurred:\\n\\t* error connecting to \\\"tcp\\\" address \\\"X.X.X.X:6562\\\": dial tcp X.X.X.X:6562: i/o timeout\\n\\t* context deadline exceeded\\n\\n\"}
As we have another audit device available Vault continue to operate but it becomes dead slow.
We detected it as our our monitoring (prometheus) was not able to get the metrics anymore
We believe that vault so spending all the ressources to try to send the audit logs and it impact normal behaviour of the vault
Are we missing something in our configuration to avoid this? should we ask for a feature request like the โdiscardโ parameter on file audit device so vault allow dropping?
We think also to move to UDP in hope it works better (as we have a local copy of the logs it sounds acceptable to us
How to reproduce:
use tc as follow to add 3 sec latency on the connection to the local audit device:
#!/usr/bin/env bash
INTERFACE="lo" # Loopback interface
PORT="8080" # Port to delay
DELAY="3000ms" # 3 second delay
tc qdisc del dev "$INTERFACE" root
tc qdisc add dev "$INTERFACE" root handle 1: prio
tc qdisc add dev "$INTERFACE" parent 1:3 handle 30: netem delay "$DELAY"
tc filter add dev "$INTERFACE" protocol ip parent 1: prio 3 u32 match ip dport "$PORT" 0xffff flowid 1:3
tc qdisc show dev "$INTERFACE"
tc filter show dev "$INTERFACE"
Then start the server that will receive the logs
nc - 8080
Start a vault with the same audit config as describe upper that target 127.0.0.1:8080
after a few minutes you will have a hard time to connect and to for example get metrics
Thanks for your time already
Mike