I don’t think you can.
There will always be some time needed for parts of the system to react to changes, so I think the goal of no failed requests is unrealistic. For example if you had a loadbalancer in front of a set of backends you would only detect failures if a healthcheck failed, or for in-line loadbalancers when a request fails. And even then there might be a delay to prevent a single random failure from removing an entire backend from the pool.
The same is true with Vault. Standby servers will only detect that the active node has failed either because a request to it has failed or a timeout for some form of heartbeat. Again there will likely be some level of failed requests in this period (unless the usage of Vault is so low that several seconds of outage would be unlikely to be noticed).
Performance Standbys (an Enterprise feature) could help to some small degree, depending on your usage patterns. If a lot of your requests are read-only spreading the load between multiple servers could reduce the chance of a request ending up on the failed active node - if there were 10 servers (9 standbys and the active) you would reduce the chance of the request failing during the active node failure from 100% (without performance standbys all requests have to go to the active node, even if via a standby) to 10% (it would fail if you chose the failed active node, but succeed if you hit one of the other 9 nodes). However you would have the extra failure mode of a replica failing - previously only the active node would get traffic, now all nodes do, so a failure of a replica is noticeable when previously it wasn’t.
Performance replicas wouldn’t help at all for write traffic - the active node being down would fail all such requests until one of the standbys took over.
You might be able to tweak settings around heartbeat frequency or timeout duration to make failure detection more sensitive, but that can be really dangerous if it causes failures to be detected when there are none (e.g. due to a single packet loss or delay). Switching the active node takes time and will result in request failures until a new leader is chosen, so you can get into the situation of a total outage if the system detects a failure before the new leader has started - causing another leadership election, triggering another failure detection, etc.
So in summary:
You’ll always have some short period without a valid active node (until the failure is detected and a new active node promoted) when failures happen. Performance replicas could reduce the impact during that period for read-only requests. However all write requests during that time (and possibly some read requests) will fail.
Getting below a few seconds of outage/instability during failures is actually very hard. Very quickly even small improvements become very difficult and expensive.