Transit engine multiple standalone nodes sharing same rds backend

Hi,

For transit secret engine encryption/decryption.

Can I run multiple standalone vault nodes (not in a HA mode cluster) sharing the same backend? These standalone vaults will use the same unsealing key set, they will get deployed in the k8s environment using same config.

I think transit engine is kind of stateless, it should be able to deploy vault instances in this way, is it right?

Thanks

Sample config:

storage "postgresql" {
    connection_url = "postgres://xxx.xxx.com:5432/postgres"
    ha_enabled = "false"
 }

listener "tcp" {
    address       = "0.0.0.0:8200"
    tls_disable   = 0
    tls_cert_file = "/root/server.cert"
    tls_key_file  = "/root/server.key"
    tls_client_ca_file = "/root/ca.cert"
    tls_disable_client_cert = "true"
    tls_min_version = "tls13"
}

disable_mlock = "true

I highly doubt you can, but even if you could, what could that possibly buy you? What is your actual requirement that this is your answer to?

Thanks Aram,

As we know HA is not for scaling, also I read some doc mentioning that vault is not vault IO bound, but by backend DB. In 0-trust environment, with TLS on top, using x-large AWS box, a HA cluster (not matter how many nodes, actually more nodes drops TPS) can only handle ~200 TPS. If multiple standalone vaults can sharing the same backend, essentially it can do horizontal scale.

I roughly tried standalone scenario, from my experiment things work well, but I don’t know whether this deployment mode is really supported. I hope someone from vault can give me some insights.

FYI, enterprise version with HA is almost a choice, but in our special deployment case, with HA we are facing some other issues (another long story if expand) that is the reason i am looking for other alternatives.

That is really low. There is something wrong with your configuration/hardware.
A single AWS xlarge EC2 box should be >1000/transits per second.
In your load tests - what does CPU/disk/memory/etc look like? where is the bottleneck?

A correction:
AWS r5.large: 2 CPU with 16G memory.
2 node cluster tried, uses the AWS internal IP for testing.

In real k8s case, the pod/container will use relative small resource limit. We don’t want to allocate huge resource for it to avoid resource waste, we want to dynamically scale up pod on demand.

FYI, for those TLS connection I am not using persistent connections, e.g, I create different connection for each call.

Roughly:
From top output: master CPU is 69% idle, memory is 9268/15580 free.
I tried to adjust the front client thread numbers, the TPS flat out about 200.
With regarding TLS, I am using 2048 RSA public key.
(FYI, without TLS in front of, a single node can handle 5000 TPS).

While on standby node: CPU is 0.2% idle. I wont get it explained well. Standby forwards requests to master etc, costing so many CPU is a little bit surprising.

I have screen shot, hope I can paste it here.

This will slow it down, but not excessively IMO…
Are you reusing authentication at least? Ie, you’re not getting a new token from Vault each time, right?

I would recommend running the transit workload benchmark here: GitHub - mikegreen/vault-benchmarking: Some Lua scripts for benchmarking Vault with the wrk tool

Correct in an OSS/standby node configuration. You are slowing it down by doing this. I would recommend having your load balancer only send traffic to the active node in the cluster. Round-robin of a cluster without performance standby/read-replicas is not recommended.

Thanks Mike,

1: Actually we are using the Root_Token for testing, so no new token each call.

2: There is no special LB configured for this component, so far, e.g, we just configure the normal TCP LB. I am not sure whether there are some LB available which can do traffic routing reliably based on so low level information, consider the fact the master can change etc.

So go back to original question: standalone mode multiple nodes, is it a feasible (or OK) way from technical view? I understand there is some lock etc mechanism going on for cluster, not sure whether it breaks something in this case (I didn’t see it in my last experiment).

No. It is not an intended deployment/use of Vault and would have unintended consequences. Could possibly work for a while but then blow up.

As @mikegreen said, 200 TPS is nothing, I’m pretty sure I have seen more than 200 TPS on transit on a medium EC2 instance. The difference is that we’re using 5 consul agents and nodes, so it’s possible that postgress is your bottleneck.

We’re actually moving to 1.7.x and integerated storage to add more capacity.

Also regarding horizontal scale, Performance Replicas are your answer for this. Each PR cluster can handle it’s own leases so that’ll expand your scale, locally.

Thanks. It seems if I go with vault, performance replicas should be the way to go.
With regard postgres, almost for sure Postgres is NOT the bottleneck.
1: As I pasted, without TLS, I can run 5000 TPS on same setup.
2: As you can see in my multiple standalone node experiment, each vault needs postgres connections, it had no problem there.

Thanks Mike’s response earlier.

I think your logic is flawed and making assumptions on how the code is written and how it interacts with transit, but that’s really not the point of the discussion. I was the same way when I started and would not believe that a database which is built entire on the premise of storage and retrieval is not the best storage medium. When I finally admitted and switched to Consul, I was really blown away with how easy, and how much faster it was.

I had the same arguments of it’s easier to backup, it’s easier to restore, I can see my data and am able to report on it, which all good points, but at the end of the day it is not the best choice. Yes you’ll need to build a new set of operations around Consul, it is worth it.

I’ll leave that with you to do with as you think best for your situation.