Removing the (huge) logging overhead for Docker

When using the Docker driver, the way Nomad catpures task’s log brings quite a crazy memory overhead : for each container, Nomad spawns one docker_logger and one logmon, each of them consuming somewhere between 50 and 60MB of RAM). When using a lot of small containers as sidecars, this overhead is just insane (on my nodes, it represented between 20 and 30% of the total memory !)

I removed this overhead by using a local vector agent on my nodes. This agent receive log streams directly from the Docker daemon (using the fluent protocol), and then recreate the log files, just how Nomad would have done. This way, you can still consult logs from the Nomad interface or API. Here’s how I’ve done : Suppress Nomad's loggi... | BookStack

Maybe it can be usefull to others :wink:

3 Likes

For whoever is coming from the search. The huge overhead is a bit exaggerated. RSS can be a confusing metric as it doesn’t take memory sharing into account. The RSS, as reported by top or ps, can be split into RssAnon, RssFile and RssShmem. Here is an example for an average nomad docker_logger:

# grep Rss /proc/<PID>/status
RssAnon:           12288 kB
RssFile:           35072 kB
RssShmem:              0 kB

This shows that the process itself allocated only 12 MB and everything else is file-backed page cache.

If we check /proc/<PID>/smaps_rollup:

Rss:               50588 kB
Pss:               14786 kB
Pss_Dirty:         13948 kB
Pss_Anon:          13948 kB
Pss_File:            838 kB
Pss_Shmem:             0 kB
Shared_Clean:      36592 kB
Shared_Dirty:          0 kB
Private_Clean:        48 kB
Private_Dirty:     13948 kB
Referenced:        50588 kB
Anonymous:         13948 kB

Shared_Clean means that this memory is shared between other processes. If we look into /proc/<PID>/smaps for details, we will find that the shared file is nomad executable.

Pss is an alternative to RSS metric, which sums the amount of memory unique to the process plus the amount of memory shared with other processes, divided by the number of processes.

TLDR: It is not 50MB but 14MB per process.

this seems like a clever hack! I wouldn’t do it myself, but still clever! :vulcan_salute: