High volume of deadlocks with Vault in Sql server

We are seeing a high volume of deadlocks since we introduced Vault process that gives permissions on schema.
Here the process1 runs the following GRANT statements
GRANT SELECT, INSERT, UPDATE, DELETE, EXECUTE ON SCHEMA :: shared_proxy TO [v-tk-ca7baef9-f8e0—xHwo6QQTzfhZKx7y7zqM-1638452722]
Process 1 holds a SCH_M lock on the resource ‘SECURITY CACHE’ under the transaction name ‘SEC Cache Coherency’

process 2 runs a stored proc mar_dev.dbo.spDD_MarriageView_Search which has dynamic sql that uses sp_executesql to insert into a #temp table.
This has never been a problem before. Now from the deadlock graph we can see that sp_executesql holds a SCH_M lock on METADATA: database_id = 39 PERMISSIONS(class = 0, major_id = 0) under the transactionname ‘read permissions’
This is interfering with SECURITY CACHE from process 1.

There is hardly any documentation around internal of sql server (security_cache).

Vault seems to be blocking pretty much every sql not just dynamic sql.
Although deadlocks are high with dynamic sql.
Any ideas on how to effectively implement vault in sql server?

Finally some closure on the deadlock issue with SQL server.

After a ton of research on the internals of sql server (Security cache), it is understood that sql server uses this cache to update, recently retrieved permissions for faster usage. Vault giving permissions is being written to security cache while the other service also trying to access the cache to read permissions. There is hardly any control that a DBA can do with the internal workings of SQL server.
Finally, the thing that worked for is using LOCK_TIMEOUT=0 in the Vault connection. We tried to put this setting in connection string but that didn’t work. Finally we tried to run this as the very 1st statement when Vault connection was made. This setting ensures that Vault does not wait indefinitely, when the resources(security cache/permissions) are held up by other session. The default setting is ‘-1’ which means a connection is blocked by sql server if the resources are held up elsewhere.
This setting has shown a significant improvement where the deadlocks have gone down by 96%.
SQL server in this case will through a 1222 error. It is better to handle the error with a retry logic.
Hope this helps someone.

1 Like

We are having the same issue. Can you provide your solution? Did you literally just have LOCK_TIMEOUT=0 at the top of the “Creation Statements”?
Thanks!

Yes, Set LOCK_TIMEOUT=0 before every create login/user statement.
Deadlock were gone but we ran into issues with timeout messages later.
We then changed all the Alter role db_fixed_role ADD member statements to
GRANT perm TO LOGIN
For example, GRANT ALTER,CONTROL,VIEW DEFENITION TO LOGIN

The timeouts reduced after the above change but still caused problems

Then we separated the REVOCATION PART and changed the REVOCATION logic to just disable the login instead of DROPPING THE LOGIN.
We then have an offline process to drop these orphaned logins.
This helped significantly in reducing the contention on SECURITY_CACHE.

Thanks for that! We just came up with this for adding the login to a database:
We currently don’t have REVOCATION statements - I’m not 100% clear on how that works.

DECLARE @SqlCommand nvarchar(max)

SELECT @SqlCommand = COALESCE(@SqlCommand+'; ','') + 'GRANT ALTER, SELECT, INSERT, UPDATE, DELETE, EXECUTE ON SCHEMA::'+QUOTENAME([SCHEMA_NAME])+' TO '+ ''+[{{name}}] + ''
FROM [INFORMATION_SCHEMA].[SCHEMATA] s
WHERE NOT EXISTS (SELECT NULL FROM dbo.sysusers su
WHERE su.issqlrole = 1 
AND s.SCHEMA_NAME = su.name)
AND s.SCHEMA_NAME <> 'sys'
AND s.SCHEMA_OWNER <> 'INFORMATION_SCHEMA'