GBs of memory wasted in Percpu thanks to stale cgroups

eaf · n00b Joined: 27 Apr 2018 Posts: 13

Hi,

I've been hunting a significant memory leak on my system where every day the amount of used memory would go up by a few GB. I'm not talking about caches, buffers, ARC, etc, I'm talking about Percpu in /proc/meminfo that climbed all the way up to 50GB at some point.

I think I've traced it down to cgroups (because I also noticed that I had an explosion of them) and then to elogind and OpenRC.

Apparently, elogind creates a new cgroup for every new login. With cgroups v.1 it was also setting a per-cgroup release_agent to /lib64/elogind/elogind-cgroups-agent that was supposed to be called when the corresponding cgroup became empty. That agent would then cleanup the empty cgroup. On systemd installations the cleanup would be done by systemd.

With cgroups v.2 the cleanup mechanism has changed, someone is now supposed to be monitoring the corresponding cgroup.events file, and when that file has "populated 0" in it, get rid of the cgroup. I guess, elogind does not support this cleanup mechanism, because tens of thousands of empty cgroups were left lying around on my system.

I think, and I may be totally wrong here, that the issue is that OpenRC by default mounts cgroups v.2 under /sys/fs/cgroup, and elogind doesn't know how to do cgroup cleanup for v.2.

Has anybody observed this pileup of unused cgroups and Percpu memory on their setups? Am I perhaps missing some sort of an /etc/init.d service that I neglected to activate and that would do this cleanup automatically thereby avoiding this pileup?

Thanks!

pingtoo · Posted: Thu Dec 19, 2024 7:01 pm Post subject:

How to find this "cground pile up" symptom? Or if I don't see it in a obvious way that mean I don't have this situation?

eaf · n00b Joined: 27 Apr 2018 Posts: 13

"grep Percpu /proc/meminfo" was showing tens of GB allocated by "per cpu" allocators.

"cat /proc/cgroups" was showing tens of thousands of groups on my setup. Once I noticed that, I looked for cgroups in /sys/fs/cgroup that had empty cgroup.procs file or "populated 0" in cgroup.events file. Most of those groups counted by /proc/cgroups were found empty. Upon destroying them, the Percpu in /proc/meminfo dropped from 50GB to 2GB.

This box sees a ton of ssh and sftp traffic, which I guess accounts for the rapid growth of abandoned per-session cgroups.

Hu · Administrator Joined: 06 Mar 2007 Posts: 23062

With what version(s) of elogind did you observe this? The output of emerge --pretend --verbose sys-apps/openrc sys-auth/elogind might be useful.

eaf · n00b Joined: 27 Apr 2018 Posts: 13

pingtoo · Posted: Thu Dec 19, 2024 8:40 pm Post subject:

eaf · n00b Joined: 27 Apr 2018 Posts: 13

That's cool, and that's what I would expect to see too. But aren't you running Debian, and likely systemd too? I'm wondering, if perhaps I'm seeing some conflicting configuration on Gentoo where OpenRC mounts cgroups v.2 and elogind can't cope with it. But I didn't specially configure any of that, it's all default.

pingtoo · Posted: Thu Dec 19, 2024 9:03 pm Post subject:

sublogic · Posted: Thu Dec 19, 2024 11:38 pm Post subject:

I see it too! But not on the same scale as eaf.

eaf · n00b Joined: 27 Apr 2018 Posts: 13

It's definitely elogind that's creating these groups:

sam_ · Developer Joined: 14 Aug 2020 Posts: 2112

For completeness, the bug OP has filed seems to be https://github.com/elogind/elogind/issues/296

eaf · n00b Joined: 27 Apr 2018 Posts: 13

I poked around elogind source code, and I kinda wish I didn't. There's a lot of #if 0 sprinkled all over the place, hundreds of lines of commented code at a time, and the functions that are supposed to setup monitoring of cgroup.events files are never even called. It might be intentional, they do say right before a big chunk of disabled inotify code that "elogind is not init, and does not install the agent here." And I get it that elogind was extracted from systemd, so some scars are supposed to be present, but boy was that an invasive surgery, and things were just left patched and bandaged throughout the code. No reaction from elogind folks about the issue. I start thinking that we're just lucky that whatever works works.

So, the options to avoid the leak appear to be:

Switch to the original systemd;
Change /etc/rc.conf to mount cgroups v.1, and then openrc will take care of the cleanup;
Set up a cronjob to scan empty cgroups and delete them manually.

sam_ · Developer Joined: 14 Aug 2020 Posts: 2112

Yamakuzure · Posted: Fri Dec 27, 2024 8:53 am Post subject:

Are you sure it must be eloginds fault?

eaf · n00b Joined: 27 Apr 2018 Posts: 13

Yes, I'm afraid all evidence points to elogind:

Every time when I ssh in and out of the box, a new cgroup is leaked under /sys/fs/cgroup;
strace of elogind and a peek into its source code show that it's elogind that's creating the cgroups;
With every newly created cgroup, Percpu in /proc/meminfo goes up 1536KB;
Once the accumulated cgroups are manually destroyed with cgdelete, the Percpu memory is reclaimed.

The effect of the leak will depend on the number of SSH sessions that get open and closed per second. I suspect, it also depends on the number of CPUs in the box or perhaps even NUMA nodes, this one reports 128 CPUs in /proc/cpuinfo. On another box with 56 CPUs I see only 256KB Percpu increase for each leaked cgroup.

gentoo_ram · Posted: Sat Dec 28, 2024 4:31 pm Post subject:

Interesting that I am not seeing this leak issue. Running OpenRC and elogind. Running on a Raspberry Pi 5 with the RPI kernel. I generally SSH into the device a couple times a day.

Yamakuzure · Posted: Sun Dec 29, 2024 11:00 am Post subject:

So, if this issue is specific to SSH'ing into the box, then it is no wonder I have not yet been able to reproduce it.

Another hard-to-reproduce (alleged) bug with SSH'ing into OpenRC+elogind run boxes exists, which I have not been able to reproduce yet. Maybe the information you gathered can help me track both issues down. That would be nice!
_________________
Edited 220,176 times by Yamakuzure

sublogic · Posted: Mon Dec 30, 2024 1:02 am Post subject: