Hi,
I am using Vault as a CA provider as well as Consul 1.9.3. My mesh is composed of 4 datacenters : primary is on VMs and the 3 others are on k8s, either EKS or bare metal.
Scenario is the following :
I setup my datacenter and they federate well, I can run any service.
After a random time (4 to 6 days) leaving the mesh untouched, I would get the x509: certificate has expired or is not yet valid
error.
Looking at the /v1/connect/ca/roots
endpoint I can see that the root certificate is indeed expired but I cannot tell why.
I read in the code that it has some kind of watcher on the validity and it should start the rotate process on its own after around 60% of the TTL is expired.
The following can be found in agents/cache-types/connect_leaf_ca.go
// 0 60% 90%
// Issued [------------------------------|===============|!!!!!] Expires
// 72h TTL: 0 ~43h ~65h
// 1h TTL: 0 36m 54m
I have very few logs on what is happening even though I run DEBUG
log level everywhere.
This log randomly happends on the primary datacenter without any previous error :
Mar 2 16:02:54 ip-10-0-4-156 consul[1297]: 2021-03-02T16:02:54.154Z [ERROR] agent.server.rpc: failed to read byte: conn=from=10.0.3.81:51263 error="tls: failed to verify client certificate: x509: certificate has expired or is not yet valid: current time 2021-03-02T16:02:54Z is after 2021-02-11T11:52:08Z"
I’ve been struggling with this for a while now… Would be happy to provide more detail.
There is a related issue including more detail :
Thanks for your help.
Marius