Hi @cohult, Confd taking a lot of memory at startup causing container to get killed due to OOM. We run confd inside a container that has a memory limit of 4gb which is well within our average usage of 2.5gb.
This issue is seen after we updated the OS on worker nodes to RHEL 9.4, before this, worker nodes were running a non RHEL OS. So this combination of worker node OS and confd is causing high memory usage since the same container image runs fine on older worker node with non-RHEL OS.
On checking the top
output in container, it appears that confd is taking 36gb of memory, this is much more than the average usage of 2.5gb
PID USER PR NI VIRT RES %CPU %MEM TIME+ S COMMAND
313 root 20 0 36.0g 33.6g 0.0 71.7 0:19.55 S `- /confd/lib/confd/erts/bin/confd.smp -S 4 -K true -MHe true -- -root /confd/lib/confd -progname confd -- -home /root -- -boot confd -noshell -noinput -foreground -+
In /proc/<pid>/smaps
I could see the memory region of huge size reserved by confd
7f3b7bfff000-7f437c000000 rw-p 00000000 00:00 0
Size: 33554436 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 33554436 kB
Pss: 33554436 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 33554436 kB
Referenced: 33554436 kB
Anonymous: 33554436 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
THPeligible: 0
ProtectionKey: 0
VmFlags: rd wr mr mw me ac sd
In cat /proc/<pid>/status
also memory reserved is too high
Name: confd.smp
Umask: 0022
State: S (sleeping)
Tgid: 310
Ngid: 0
Pid: 310
PPid: 186
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 256
Groups: 0 4000
NStgid: 310
NSpid: 310
NSpgid: 1
NSsid: 1
VmPeak: 37724308 kB
VmSize: 37715316 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 35198624 kB
VmRSS: 35196608 kB
RssAnon: 35189568 kB
RssFile: 7040 kB
RssShmem: 0 kB
VmData: 35231196 kB
VmStk: 156 kB
VmExe: 3520 kB
VmLib: 4692 kB
VmPTE: 69132 kB
VmSwap: 0 kB
HugetlbPages: 0 kB
CoreDumping: 0
THP_enabled: 1
untag_mask: 0xffffffffffffffff
Threads: 24
SigQ: 4/191517
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000001080
SigCgt: 0000000100000206
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Seccomp_filters: 0
Speculation_Store_Bypass: thread vulnerable
SpeculationIndirectBranch: conditional enabled
Cpus_allowed: ffff
Cpus_allowed_list: 0-15
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 53
nonvoluntary_ctxt_switches: 35
Is there a way to figure out why confd is taking more memory than usual because of change is worker node OS, Is there a way I can configure confd to be agnostic of worker OS changes ?
Regards
Abhijeet