Consul agent CPU Utilization jumping to its limits in 5 mins with bare minimum traffic and not getting recovered

shashankgarg22 · May 3, 2023, 11:43am

Question:

Problem Description:

We are using Consul for service discovery.
Our consul servers with 3 Replica set, each pod is having consul agent as a separate container.
Service discovery is done through those consul agent containers.
Sporadically, we are observing consul agent containers reaching their CPU Limits which are defined as 200m in 5 mins with bare minimum traffic as :

Monitor Logs
2023-04-27T13:44:47.264Z [DEBUG] agent.client.memberlist.lan: memberlist: Stream connection from=xxx.xxx.xxx.xxx:59022
2023-04-27T13:44:36.329Z [DEBUG] agent: Check status updated: check=service:mtb123-LocalD-mtb123-scscf-8488d6b876-xkdjl status=passing
2023-04-27T13:44:39.099Z [DEBUG] agent: Check status updated: check=service:mtb123-RtpRestServ-mtb123-scscf-8488d6b876-xkdjl status=passing
2023-04-27T13:44:39.624Z [DEBUG] agent.http: Request finished: method=GET url=/v1/status/leader from=127.0.0.1:50022 latency=365.303µs
2023-04-27T13:42:33.121Z [DEBUG] agent: Service in sync: service=mtb123-LocalD-mtb123-scscf-8488d6b876-xkdjl
2023-04-27T13:42:33.121Z [DEBUG] agent: Service in sync: service=mtb123-RtpRestServ-mtb123-scscf-8488d6b876-xkdjl
2023-04-27T13:42:33.121Z [DEBUG] agent: Check in sync: check=service:mtb123-RtpRestServ-mtb123-scscf-8488d6b876-xkdjl

Most of the time the CPU remains less than 30m - 40m.

Once it reaches 200m then it is not coming back to its original state.

After having the top command
wnv4a0cscf0001c-scscf-66785f456f-vqgwj consul-agent 201m 69Mi

Consul version : 1.8.0

Consul Helm chart template:

consul-agent:

**Limits:**
  **cpu:     200m**
  memory:  200Mi
Requests:
  cpu:      100m
  memory:   200Mi

**Readiness:**  exec [/bin/sh -ec curl --connect-timeout 5 --max-time 5 http://localhost:8500/v1/status/leader \

2>/dev/null grep -E ‘“.+”’
] delay=0s timeout=5s period=10s #success=1 #failure=3

Events:

Type Reason Age From Message

Warning Unhealthy 23s (x19473 over 34d) kubelet, wnv4a-c00-perfworkerbm-22 Readiness probe failed:

LOGS FOR THE CONSUL_AGEN CONTAINER

[wnv4a0cscf0001c-user@wnv4b0depl0001vm001 ~] [wnv4a0cscf0001c-user@wnv4b0depl0001vm001 ~]
[wnv4a0cscf0001c-user@wnv4b0depl0001vm001 ~]$ kc logs wnv4a0cscf0001c-scscf-66785f456f-vqgwj -c consul-agent
/opt/startups/functions_source.sh: line 7: /logstore/wnv4a0cscf0001c-scscf-66785f456f-vqgwj/consul-agent/startlog.txt: No such file or directory
Consul in Client Mode
Cannot create /tmp/consul-test. Reverting to /tmp
==> Starting Consul agent…
Version: ‘v1.0.6.xxxx-23-g6e7b310’
Node ID: ‘ffb41e1d-6a01-xxxxx-32d3-35864ca780d6’
Node name: ‘wnv4a0cscf0001c-scscf-66785f456f-vqgwj’
Datacenter: ‘wnv4a0cscf0001c’ (Segment: ‘’)
Server: false (Bootstrap: false)
Client Addr: [127.0.0.1 xxx.xx.xx.229] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: -1)
Cluster Addr: xx.xx.xx.229 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

==> Consul agent running!
2023-03-21T18:11:41.829Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“No known Consul servers”
2023-03-21T18:11:48.429Z [ERROR] agent: Failed to check for updates: error=“Get “https://checkpoint-api.hashicorp.com/v1/check/consul?arch=amd64&os=linux&signature=446d8fb7-xxxc&version=1.8.0”: dial tcp 18.160.60.92:443: i/o timeout (Client.Timeout exceeded while awaiting headers)”
2023-03-21T18:38:24.791Z [ERROR] agent.client: RPC failed to server: method=Status.Leader server=xxx.xx.xx.123:8300 error=“rpc error making call: stream closed”
2023-03-21T18:38:33.012Z [ERROR] agent.client.memberlist.lan: memberlist: Conflicting address for wnv4a0cscf0001c-sd-0. Mine: xx.xx.xxxxx.123:8301 Theirs: xx.xxx.xxx.12:8301 Old state: 2
2023-03-22T18:39:00.607Z [ERROR] agent: Failed to check for updates: error=“Get “https://checkpoint-api.hashicorp.com/v1/check/consul?arch=amd64&os=linux&signature=4xxx8fb7-8d99-a8d6-xxxxec3d0c&version=1.8.0”: context deadline exceeded (Client.Timeout exceeded while awaiting headers)”

[wnv4a0cscf0001c-user@wnv4b0depl0001vm001 ~] [wnv4a0cscf0001c-user@wnv4b0depl0001vm001 ~]

Victoria matrix graph :

Debug Error:

Please Help us to Solve this Problem and please describe the cause for it.

**If Any other details are needed, please let us know **

Topic		Replies	Views
Mixed consul agent versions Consul	1	169	March 13, 2024
Consul Hight CPU Usage Consul k8s , vault	0	101	July 6, 2024
Consul getting restarted when large requests are triggered between services Consul consul-nomad , consul	0	44	January 23, 2025
Performance troubles. Too many RPC request? Consul	0	618	October 29, 2020
Consul keeps high CPU utilization Consul k8s	4	2299	April 12, 2020

Consul agent CPU Utilization jumping to its limits in 5 mins with bare minimum traffic and not getting recovered

Question:

Problem Description:

Consul Helm chart template:

Events:

LOGS FOR THE CONSUL_AGEN CONTAINER

Victoria matrix graph :

Debug Error:

Please Help us to Solve this Problem and please describe the cause for it.

Related topics