Transition between cert-enabled and cert-disabled consul environment

We have been running Consul for a few years, with certs enabled. We have now decided that we’re sufficiently locked down that we can stop using consul certs (if we were to continue we’d have to replace root certs anyway, so we’re forced to make some changes).

My question is:
Is it possible to transition between cert-enabled and cert-disabled state without bringing down the entire cluster?
I’ve tried setting the verify_* fields to false in the hope that that would make a cert-enabled node accept traffic from a cert-disabled node, but that doesn’t seem to be working. (I’m getting a bunch of consul.rpc: failed to read byte: tls: no certificates configured from= logged on the cert-disabled node).

I guess I was hoping that even if the client has the key_file, cert_file, ca_file fields configured, it would still be able to accept traffic from other nodes without those fields configured, but that does not seem to be the case.

So to sum up, I’m looking for pointers on how to make this transition without bringing down the entire consul infrastructure.

Thank you for reaching out. Which consul version are you using?

I believe we are on cirka 1.4.0, but we can upgrade to any version if there’s a reason for it.

Setting all verify_* flags to false on every server and do a rolling restart should enable cert-disabled clients to connect.

Is this not the case?

as far as we can see, no. But let me try again and report back. Thanks for the input.

So I restested, and as far as I can see, it does not work - at least not on 1.4.0. Even if all other nodes in the cluster has verify* set to false (and after a restart), if I join a “cert-disable” agent, it gets [ERR] consul.rpc: failed to read byte: tls: no certificates configured from=

I also tested on version 1.6.2, same results. The cluster seems to work tho (at least I can see all nodes in the UI, including the cert-disabled one) so its difficult to understand actually what is failing.

update on this:
I dont know what I did wrong the last time, but after setting the verify* flag to false on all servers (not clients) and restarting all servers, I was able to provision clients without certificates.

However, it looks like I still need to reconfigure all servers at the same time. I tried disabling certs on one server, and it was unable to regain contact with the cluster. According to logs the issue was tls-related. So I’m getting the feeling that certs are used differently on servers than on clients. It’s not a problem in itself, I just wish all of this was documented somewhere.