Setting up file audit device on HA Vault cluster

I have a three-node HA Vault setup backed up by Consul, with two nodes on standby in case the active node goes down. My plan was to use the file audit device and a log shipper to collect Vault’s audit logs in Loki or Elasticsearch.

So I enabled the file device on one node:

vault audit enable file file_path=/var/log/vault/audit.log

What irritates me, that it says “replicated”, although it has been only activated on one node:

vault audit list -detailed
Path     Type    Description    Replication    Options
----     ----    -----------    -----------    -------
file/    file    n/a            replicated     file_path=/var/log/vault/audit.log

I also couldn’t enable the same interface on the other nodes, as it was already activated for the cluster.

What is the best way to set up the audit backend on a HA cluster? My initial understanding was that the active node would write the logs to the local disk and if the node should go down, one of the standby nodes would take over, becoming the active node and start writing the log files on their local file system. Am I missing something?

But that would not be HA anymore (for service yes, for data not).
What if your former primary can no longer be saved? Hard disc defective. All data slips. Then your former audit log files would also be gone and you would have a big gap in the history. OK, you send the logs to an outsourced system, but that cannot be assumed as the default setup and it should be working without it.

You are discribing a kind of cold standby; Vault is more acting like a warm standby, I guess.

Maybe it’s worth it to open a feature request for local audit files. :man_shrugging:t3:

Data is stored on a multi-node Consul cluster as well. I’m not even considering a high volume of traffic on Vault in this case, but yes, there could be a minimal gap in reading/writing secrets and audit logs while switching nodes.

But overall, I think this is a pretty much standard scenario, so I think there must be a way to handle the audit log in a reliable way.

Maybe it is better to use the syslog device instead, sending data to a local syslog-ng or rsyslog client configured with a TCP listener on each node forwarding logs to a syslog server and from there to Elasticsearch. :thinking:

OK, I see, with data I meant the audit logs. :slight_smile:

Hey @fhemberger,

So when you issue the command:

vault audit enable file file_path=/var/log/vault/audit.log

This is an API call to Vault and it’s “settings” are stored in the backend datastore. See here: https://www.vaultproject.io/api-docs/system/audit

So when it says it’s replicated, this setting in the datastore is replicated. When a standby node takes over, it locks into the datastore and applies the “setting” to its node.

That’s how our setup is.

Hi @adeelahmad84,

thanks for your explanation. In the meantime I “solved” my issue by using the syslog audit device instead and shipping the audit logs with syslog-ng to a centralized log collector.

1 Like

Should be noted, it is recommended to have multiple audit endpoints, as if there is only one and it fails/fills/etc Vault will not service requests and return HTTP 500, since it cannot log.