Transitioning existing clusters to use consul connect

I have ~90 services in an environment that I want to deploy with connect sidecar, preferably without incurring any downtime.

In current setup we rely on the DNS interface for service discovery — the standard stuff, resolve SRV record for servicename.service.consul and connect to a random IP:port pair. Prepared queries are not involved here, and lets also assume that we aren’t using tags and everything is happening within the same datacenter.

I want to gradually introduce Connect sidecars but there are some problems I can’t figure out how to solve.

Enabling connect on a service upstream will break all its dependants because now the dependants will not be able to perform mTLS. DNS resolution will also stop working.

Starting with the services with no downstream and introducing connect there will work but as soon as I convert the upstream services I’ll also have to update the downstream jobspecs to include the connect proxy upstream (Nomad is at work). So this will also break downstream until I update the jobspec and redeploy.

Another option that I started looking at was the expose stanza in jobspec to expose all endpoints of the service during the transition period, but I read through the code in Consul and it turns out the expose configuration is generate to match on path instead of prefix or regex. This makes it impossible to expose something like /*. I have not actually tested this one yet because reading the code was a lot less effort. I can test it though.

Is there any way to introduce Connect in already running cluster? How are other people doing it? Can anyone share their experience with a similar exercise?