Hi, I’m trying to fully understand when a pod will appear in Consul. We are trying to estimate the time it takes from a deployment being created to it showing up in Consul.
I have a pod that is being added with 1 replica, and takes about 30 seconds to pass readiness checks. It also has an endpoint that gets created immediately. I initially assumed it would show up in consul immediately due to the endpoint being created so quickly, but it actually takes around 40 seconds to show up. At first I thought this was due to the sync interval so I set that to 8s (rather than the default 30s) but that didn’t change anything… Now I’m thinking it waits for the health checks to pass, but that delta still doesn’t line up.
Here’s some analysis I’ve been doing…
I wrote a script to scale to 0, wait until the service is no longer in consul catalog, then scale up and see how long it takes to show up in catalog. Here is the output, side by side with the logs (of which I’ve omitted specific details).
Script output
2021/02/23 18:27:14 Scaled down to 0
2021/02/23 18:28:08 Service removed from consul
2021/02/23 18:28:08 Scaled up to 1
2021/02/23 18:28:10 Found endpoint in K8s Endpoints
2021/02/23 18:28:34 Pod is ready in K8s
2021/02/23 18:28:51 Found service in Consul Catalog
2021/02/23 18:28:51
NAME | TIME
---------------------------------------+----------
Time from replica change to catalog | 42398ms
Time from endpoint to catalog | 40343ms
Time from container ready to catalog | 16153ms
Average Request Time (error) | 185ms
As you can see, it takes about 40 seconds for it to appear, and 16 seconds to appear even after passing readiness checks. Can anyone help me realize where are these delays coming from?
I’m also getting very inconsistent values. I’ve removed the flag that sets the Sync Period to 8s for reasons mentioned below. Here are two runs of the test where you can see time from ready is more than double the one before. We are really trying to find some consistent metric that we can rely on.
NAME | TIME
---------------------------------------+----------
Time from replica change to catalog | 45551ms
Time from endpoint to catalog | 44304ms
Time from container ready to catalog | 22709ms
Average Request Time (error) | 183ms
NAME | TIME
---------------------------------------+----------
Time from replica change to catalog | 35994ms
Time from endpoint to catalog | 33752ms
Time from container ready to catalog | 7752ms
Average Request Time (error) | 186ms
I also just noticed this line in the source code, alluding to some way to trigger syncs. Does anyone know where I can find more info on that? It sounds like a sync would be triggered by a deployment or scaling change, but as you can see in the first test added in this post, it takes 22 seconds for the new replica to show up! It doesn’t seem like anything was triggered at all.
This is making me wonder if dropping the SyncPeriod so drastically is incorrect…
Hi, so unfortunately when endpoints are updated (via new deployments), the syncer doesn’t trigger immediately. It only triggers (right now) on new/deleted services. So then it relies on the “full sync” to pick up new endpoints.
Regarding the write interval, I’m not seeing that behaviour: