Nomad 1.6.1 reporting negative host cpu usage

lhossack · August 15, 2023, 3:03pm

After an upgrade (v0.10.5 to 1.6.1) I’ve recently seen on several hosts (nomad clients) messages like

2023-08-15T14:17:12.275Z [ERROR] agent: Attempting to increment Prometheus counter nomad_client_host_cpu_total_ticks_count with value negative value -2242.5742
agent: Attempting to increment Prometheus counter nomad_client_host_cpu_total_ticks_count with value negative value -2242.5742

and also can sometimes get responses similar to

Host Resource Utilization
CPU              Memory           Disk
-472/120000 MHz  6.8 GiB/376 GiB  (/dev/mapper/encryptedvol)

when running

nomad node status -self

Has anyone experienced similar behaviour?

It doesn’t seem to matter if there’s workloads on the host or not as our monitoring shows these error messages reported across several hosts.

All hosts are running

# uname -a
Linux <hostname> 4.19.0-20-amd64 #1 SMP Debian 4.19.235-1 (2022-03-17) x86_64 GNU/Linux

I traced through nomad’s code at tag 1.6.1 and found these result from queries to /proc/stat on linux with some code to calculate percentages from the change in jiffies (https://github.com/hashicorp/nomad/blob/515895c7690cdc72278018dc5dc58aca41204ccc/client/stats/cpu.go#L133). This code has been moved in a recent commit but I believe it is functionally the same.

Any thoughts/ suggestions welcome!

Topic		Replies	Views
Cpu metrics values higher than expected Nomad	0	196	December 21, 2023
Clarification on the nomad.client.allocs.cpu.total_percent metric for Docker driver Nomad	4	3205	November 23, 2020
Empty memory stats for allocations Nomad	10	1117	December 18, 2021
Official grafana dashboard Nomad	2	436	April 8, 2024
PromQL queries with telemetry Nomad prometheus	0	24	November 19, 2024

Nomad 1.6.1 reporting negative host cpu usage

Related topics