Vault 1.15.6 DynamoDB excessive writes - max_parallel setting?

We recently upgraded from our antique 1.7.0 version of vault to 1.15.6, had no issues with the upgraded everything worked well and tests came back fine after. However when we really started to use it we started seeing our DynamoDB backend getting overwhelmed.

We have historically (5+ years) have had a provisioned amount of 5 write units set on the table, and able to scale up to 100. We’ve never seen it throttle and I’ve never needed to look at it ever. First mass check of vault secrets for an app deploy though and DynamoDB was showing 100 write units spun up and before that even happened a whole slew of throttled requests.

When throttled requests happen it causes Vault’s active node to believe there’s a DynamoDB failure and it stops and restarts, which causes the standby to take over, which causes a brief blip on our load balancer and for that period we lose Vault, breaking our deploy. It bounces like this back and forth pinned at max write units.

I’ve kicked up our write units to 50 as a baseline but this of course has a cost impact that seems silly as at idle it’s using 1 write unit. DynamoDB doesn’t scale up fast enough to handle what Vault is sending it now. And at times since moving the baseline to 50 we still see throttled requests and failures. Our deploy service is merely pulling vault secrets using a single token, many secrets but still…not so many that I’d think we’d overwhelm anything especially when it worked fine before.

So I saw this max_parallel setting as an option in the DynamoDB storage stanza, but there’s no guidance on how it correlates to write/read units. Does anyone use this? Is there any way to mesh that with what we see for write or read unit usage during busy periods? Will this setting actually limit how much Vault is sending to the storage backend?

On this page it shows the default for this as 128, and the default for write_capacity (and read) which I presume to be units, as 5. I realize the capacity settings are only applicable at table creation, but if the default would create a table with 5 units set and the default for max_parallel presumably is tuned for that, why does the default of 128 cause Vault to die?

This Github issue shows others having this problem on other recent releases. Anyone have ideas?