Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
openrc-init - zombie reaping
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3870
Location: Rasi, Finland

PostPosted: Mon Jan 13, 2025 11:40 pm    Post subject: openrc-init - zombie reaping Reply with quote

Firstly:
grknight wrote:
Zucca wrote:
pingtoo wrote:
One of "init" duty for linux is reap zombie process.
Ok. I thought this was kernel's job. Is it really so? That would explain one oddity I have...

Yes, it is init's job. See a simple example from openrc-init that catches SIGCHLD and reaps

That ^ link contains the following function:
Code:
static void signal_handler(int sig)
{
   switch (sig) {
      case SIGINT:
         handle_shutdown("reboot", RB_AUTOBOOT);
         break;
      case SIGTERM:
#ifdef SIGPWR
      case SIGPWR:
#endif
         handle_shutdown("shutdown", RB_HALT_SYSTEM);
         break;
      case SIGCHLD:
         reap_zombies();
         break;
      default:
         printf("Unknown signal received, %d\n", sig);
         break;
   }
}
I don't know much of C, but it looks like if openrc-init receives SIGCHLD signal it'll start cleaning out zombie processes.
I tried, but nothing happened. I have one box where dhcpcd processes spawned by networkmanager are eventually left as zombies. So I'm trying to solve it here too.

Also I don't quite understand the following either:
reap_zombies():
static void reap_zombies(void)
{
   pid_t pid;

   for (;;) {
      pid = waitpid(-1, NULL, WNOHANG);
      if (pid == 0)
         break;
      else if (pid == -1) {
         if (errno == ECHILD)
            break;
         perror("waitpid");
         continue;
      }
   }
}

for (;;)? Is this like while true? Loops forever? No condition passed.
The waitpid should allow zombified process to exit?
_________________
..: Zucca :..

My gentoo installs:
init=/sbin/openrc-init
-systemd -logind -elogind seatd

Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
GDH-gentoo
Veteran
Veteran


Joined: 20 Jul 2019
Posts: 1787
Location: South America

PostPosted: Tue Jan 14, 2025 12:27 am    Post subject: Re: openrc-init - zombie reaping Reply with quote

Zucca wrote:
I don't know much of C, but it looks like if openrc-init receives SIGCHLD signal it'll start cleaning out zombie processes.

It is a POSIX thing more than a C thing. Yes, it will reap zombie processes when receiving a SIGCHLD signal. The signal itself is just an 'alarm' though ("hey, you have a new zombie"), actual reaping happens when appropriate functions are called (such as waitpid()).

Reception of SIGCHLD happens automatically (it is implemented by the kernel) when a child process terminates. openrc-init also has the peculiarity that it runs as process 1, so it 'inherits' as children all processes whose parents terminate, as a special property.

Zucca wrote:
I tried, but nothing happened.

Tried what?

Zucca wrote:
I have one box where dhcpcd processes spawned by networkmanager are eventually left as zombies. So I'm trying to solve it here too.

How does the process tree (ps axf) look like when it happens?

Zucca wrote:
Also I don't quite understand the following either:
[...]
for (;;)? Is this like while true? Loops forever?

Yes. But there are break statements in the loop body that are executed on certain conditions. When they are executed, execution flow continues after the loop.

Zucca wrote:
The waitpid should allow zombified process to exit?

It is one of the functions that results in reaping a zombie when called, yes.
_________________
NeddySeagoon wrote:
I'm not a witch, I'm a retired electronics engineer :)
Ionen wrote:
As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though :)


Last edited by GDH-gentoo on Tue Jan 14, 2025 1:42 am; edited 1 time in total
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22998

PostPosted: Tue Jan 14, 2025 12:48 am    Post subject: Reply with quote

Building on the hint from GDH-gentoo, my guess is that the zombies that Zucca saw are the children of some not-yet-exited process that is not reaping them in a timely manner. When the parent of a zombie exits, the zombie's parent is changed to the nearest subreaper (which by default is pid 1). The new parent would then reap the zombie. However, if the original parent has not exited yet, then the zombie remains under the parent, because the parent might yet want to reap the zombie, and each zombie can only be reaped once.
Back to top
View user's profile Send private message
sublogic
Apprentice
Apprentice


Joined: 21 Mar 2022
Posts: 297
Location: Pennsylvania, USA

PostPosted: Tue Jan 14, 2025 3:09 am    Post subject: Reply with quote

To elaborate a bit on the above:
  1. In C's "for" statement, for(init ; test ; postbody), all three of init, test and postbody can be omitted, and a missing test means "true". So for( ; ; ) is an infinite loop.
  2. Zombies exist because the parent of a terminated process has the right to collect its resource usage statistics with the wait4() system call.
  3. Zombies are just tiny stubs, most of the terminated process' resources are already reclaimed. The parent's wait4() reclaims the rest.
  4. If the parent exits without wait()ing, init inherits the zombie and promptly reaps it without bothering to collect statistics.
  5. If the parent hangs around and never wait()s, zombies accumulate. Kill the parent !
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3870
Location: Rasi, Finland

PostPosted: Tue Jan 14, 2025 8:49 am    Post subject: Re: openrc-init - zombie reaping Reply with quote

Ok. Thanks guys. I learnt a lot about zombie processes and how those are normally handled.

GDH-gentoo wrote:
Zucca wrote:
I tried, but nothing happened.

Tried what?
I sent the signal manually to openrc-init, but that not obviously how it works (after reading replies on this topic).

GDH-gentoo wrote:
How does the process tree (ps axf) look like when it happens?
It happens all the time. I think it occurs when the box loses it's wifi connection. I haven't (yet) counted if "zombies reproduce" when the connection is lost.
Code:
M710q ~ # pstree -A 1519
NetworkManager-+-107*[dhcpcd]
               `-3*[{NetworkManager}]
... where 1519 is the networkmanager main process PID.
*snip* of ps output:
  PID TTY      STAT   TIME COMMAND
1519 ?        Ssl    5:21 /usr/sbin/NetworkManager --pid-file /run/NetworkManager/NetworkManager.pid
 1905 ?        Z      0:00  \_ [dhcpcd] <defunct>
 2375 ?        Z      0:00  \_ [dhcpcd] <defunct>
 4412 ?        Z      0:00  \_ [dhcpcd] <defunct>
 4591 ?        Z      0:00  \_ [dhcpcd] <defunct>
 4875 ?        Z      0:00  \_ [dhcpcd] <defunct>
 5040 ?        Z      0:00  \_ [dhcpcd] <defunct>
 5188 ?        Z      0:00  \_ [dhcpcd] <defunct>
 5462 ?        Z      0:00  \_ [dhcpcd] <defunct>
 5616 ?        Z      0:00  \_ [dhcpcd] <defunct>
 5787 ?        Z      0:00  \_ [dhcpcd] <defunct>
 6072 ?        Z      0:00  \_ [dhcpcd] <defunct>
 8101 ?        Z      0:00  \_ [dhcpcd] <defunct>
 8585 ?        Z      0:00  \_ [dhcpcd] <defunct>
11034 ?        Z      0:00  \_ [dhcpcd] <defunct>
11085 ?        Z      0:00  \_ [dhcpcd] <defunct>
11117 ?        Z      0:00  \_ [dhcpcd] <defunct>
12902 ?        Z      0:00  \_ [dhcpcd] <defunct>
13179 ?        Z      0:00  \_ [dhcpcd] <defunct>
13559 ?        Z      0:00  \_ [dhcpcd] <defunct>
13898 ?        Z      0:00  \_ [dhcpcd] <defunct>
15548 ?        Z      0:00  \_ [dhcpcd] <defunct>
15835 ?        Z      0:00  \_ [dhcpcd] <defunct>
16108 ?        Z      0:00  \_ [dhcpcd] <defunct>
16268 ?        Z      0:00  \_ [dhcpcd] <defunct>
16287 ?        Z      0:00  \_ [dhcpcd] <defunct>
16494 ?        Z      0:00  \_ [dhcpcd] <defunct>
17157 ?        Z      0:00  \_ [dhcpcd] <defunct>
17181 ?        Z      0:00  \_ [dhcpcd] <defunct>
17201 ?        Z      0:00  \_ [dhcpcd] <defunct>
17230 ?        Z      0:00  \_ [dhcpcd] <defunct>
17259 ?        Z      0:00  \_ [dhcpcd] <defunct>
17299 ?        Z      0:00  \_ [dhcpcd] <defunct>
17579 ?        Z      0:00  \_ [dhcpcd] <defunct>
19249 ?        Z      0:00  \_ [dhcpcd] <defunct>
19288 ?        Z      0:00  \_ [dhcpcd] <defunct>
19746 ?        Z      0:00  \_ [dhcpcd] <defunct>
20047 ?        Z      0:00  \_ [dhcpcd] <defunct>
20080 ?        Z      0:00  \_ [dhcpcd] <defunct>
21784 ?        Z      0:00  \_ [dhcpcd] <defunct>
22523 ?        Z      0:00  \_ [dhcpcd] <defunct>
22802 ?        Z      0:00  \_ [dhcpcd] <defunct>
23082 ?        Z      0:00  \_ [dhcpcd] <defunct>
23116 ?        Z      0:00  \_ [dhcpcd] <defunct>
23399 ?        Z      0:00  \_ [dhcpcd] <defunct>
23422 ?        Z      0:00  \_ [dhcpcd] <defunct>
23459 ?        Z      0:00  \_ [dhcpcd] <defunct>
25189 ?        Z      0:00  \_ [dhcpcd] <defunct>
25214 ?        Z      0:00  \_ [dhcpcd] <defunct>
25497 ?        Z      0:00  \_ [dhcpcd] <defunct>
25652 ?        Z      0:00  \_ [dhcpcd] <defunct>
25833 ?        Z      0:00  \_ [dhcpcd] <defunct>
26164 ?        Z      0:00  \_ [dhcpcd] <defunct>
28591 ?        Z      0:00  \_ [dhcpcd] <defunct>
28910 ?        Z      0:00  \_ [dhcpcd] <defunct>
30632 ?        Z      0:00  \_ [dhcpcd] <defunct>
 1034 ?        Z      0:00  \_ [dhcpcd] <defunct>
 1055 ?        Z      0:00  \_ [dhcpcd] <defunct>
 1368 ?        Z      0:00  \_ [dhcpcd] <defunct>
 1705 ?        Z      0:00  \_ [dhcpcd] <defunct>
 3757 ?        Z      0:00  \_ [dhcpcd] <defunct>
 3920 ?        Z      0:00  \_ [dhcpcd] <defunct>
 4209 ?        Z      0:00  \_ [dhcpcd] <defunct>
 4236 ?        Z      0:00  \_ [dhcpcd] <defunct>
 4254 ?        Z      0:00  \_ [dhcpcd] <defunct>
 4283 ?        Z      0:00  \_ [dhcpcd] <defunct>
 4557 ?        Z      0:00  \_ [dhcpcd] <defunct>
 4870 ?        Z      0:00  \_ [dhcpcd] <defunct>
 7186 ?        Z      0:00  \_ [dhcpcd] <defunct>
13633 ?        Z      0:00  \_ [dhcpcd] <defunct>
13906 ?        Z      0:00  \_ [dhcpcd] <defunct>
14181 ?        Z      0:00  \_ [dhcpcd] <defunct>
14464 ?        Z      0:00  \_ [dhcpcd] <defunct>
14755 ?        Z      0:00  \_ [dhcpcd] <defunct>
15046 ?        Z      0:00  \_ [dhcpcd] <defunct>
15330 ?        Z      0:00  \_ [dhcpcd] <defunct>
15615 ?        Z      0:00  \_ [dhcpcd] <defunct>
15897 ?        Z      0:00  \_ [dhcpcd] <defunct>
16181 ?        Z      0:00  \_ [dhcpcd] <defunct>
16465 ?        Z      0:00  \_ [dhcpcd] <defunct>
16745 ?        Z      0:00  \_ [dhcpcd] <defunct>
17036 ?        Z      0:00  \_ [dhcpcd] <defunct>
17328 ?        Z      0:00  \_ [dhcpcd] <defunct>
17605 ?        Z      0:00  \_ [dhcpcd] <defunct>
21517 ?        Z      0:00  \_ [dhcpcd] <defunct>
21552 ?        Z      0:00  \_ [dhcpcd] <defunct>
21837 ?        Z      0:00  \_ [dhcpcd] <defunct>
21861 ?        Z      0:00  \_ [dhcpcd] <defunct>
21890 ?        Z      0:00  \_ [dhcpcd] <defunct>
21909 ?        Z      0:00  \_ [dhcpcd] <defunct>
21937 ?        Z      0:00  \_ [dhcpcd] <defunct>
22003 ?        Z      0:00  \_ [dhcpcd] <defunct>
24205 ?        Z      0:00  \_ [dhcpcd] <defunct>
24486 ?        Z      0:00  \_ [dhcpcd] <defunct>
24523 ?        Z      0:00  \_ [dhcpcd] <defunct>
24807 ?        Z      0:00  \_ [dhcpcd] <defunct>
24828 ?        Z      0:00  \_ [dhcpcd] <defunct>
25206 ?        Z      0:00  \_ [dhcpcd] <defunct>
25493 ?        Z      0:00  \_ [dhcpcd] <defunct>
25786 ?        Z      0:00  \_ [dhcpcd] <defunct>
26074 ?        Z      0:00  \_ [dhcpcd] <defunct>
26360 ?        Z      0:00  \_ [dhcpcd] <defunct>
26646 ?        Z      0:00  \_ [dhcpcd] <defunct>
26930 ?        Z      0:00  \_ [dhcpcd] <defunct>
27215 ?        Z      0:00  \_ [dhcpcd] <defunct>
27784 ?        Z      0:00  \_ [dhcpcd] <defunct>
28067 ?        Z      0:00  \_ [dhcpcd] <defunct>
28370 ?        S      0:00  \_ dhcpcd: wlp0s20f0u8 [ip4]
 1567 ?        S      7:58 /usr/sbin/wpa_supplicant -u
 1569 ?        Sl     0:00 /usr/sbin/ModemManager


GDH-gentoo wrote:
Zucca wrote:
Also I don't quite understand the following either:
[...]
for (;;)? Is this like while true? Loops forever?

Yes. But there are break statements in the loop body that are executed on certain conditions.
Ok. Thanks. I'm familiar with break, continue and so on. The (;;) -part was the uncertain part for me. Would while () do the same?

GDH-gentoo wrote:
Zucca wrote:
The waitpid should allow zombified process to exit?

It is one of the functions that results in reaping a zombie when called, yes.
[/quote]... and I assume waitpid is some standard function from libc? Or even a syscall?

Thanks Hu for the insights to the mechanism.

sublogic wrote:
Zombies are just tiny stubs, most of the terminated process' resources are already reclaimed. The parent's wait4() reclaims the rest.
sublogic wrote:
If the parent hangs around and never wait()s, zombies accumulate. Kill the parent !
Indeed. By running rc-config restart NetworkManager, zombies got reaped.

So... The issue lies (probably) in nerworkmanager, as it's direct parent to the dhcpcd.
I could recompile networkmanager to use dhclient instead of dhcpcd to rule out dhcpcd as the culprit.
_________________
..: Zucca :..

My gentoo installs:
init=/sbin/openrc-init
-systemd -logind -elogind seatd

Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
GDH-gentoo
Veteran
Veteran


Joined: 20 Jul 2019
Posts: 1787
Location: South America

PostPosted: Tue Jan 14, 2025 12:57 pm    Post subject: Re: openrc-init - zombie reaping Reply with quote

Zucca wrote:
*snip* of ps output:
  PID TTY      STAT   TIME COMMAND
1519 ?        Ssl    5:21 /usr/sbin/NetworkManager --pid-file /run/NetworkManager/NetworkManager.pid
 1905 ?        Z      0:00  \_ [dhcpcd] <defunct>
 2375 ?        Z      0:00  \_ [dhcpcd] <defunct>
 4412 ?        Z      0:00  \_ [dhcpcd] <defunct>
...

That's a lot of zombie child processes, indeed, that process 1519 seems to not be reaping (in a timely manner at least).

Zucca wrote:
Ok. Thanks. I'm familiar with break, continue and so on. The (;;) -part was the uncertain part for me. Would while () do the same?

Okay. Yes, for (;;) is valid C syntax that means "loop forever" for the reasons given by sublogic; however while () is not. The condition can't be omitted, so you have to write while (1) or equivalent condition to loop forever.

Zucca wrote:
... and I assume waitpid is some standard function from libc? Or even a syscall?

It is a "system interface" specified by the POSIX standard that, on amd64, the libc implements as a wrapper around a Linux system call named wait4().

Zucca wrote:
So... The issue lies (probably) in nerworkmanager, as it's direct parent to the dhcpcd.

Yeah, looks like.
_________________
NeddySeagoon wrote:
I'm not a witch, I'm a retired electronics engineer :)
Ionen wrote:
As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though :)
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22998

PostPosted: Tue Jan 14, 2025 2:59 pm    Post subject: Re: openrc-init - zombie reaping Reply with quote

Zucca wrote:
So... The issue lies (probably) in nerworkmanager, as it's direct parent to the dhcpcd.
I could recompile networkmanager to use dhclient instead of dhcpcd to rule out dhcpcd as the culprit.
From the output shown, dhcpcd cannot be the culprit, because it exited in a presumably normal manner. Zombie cleanup is the responsibility of the survivor, not the zombie, so no bugs in dhcpcd could justify the observed results. At worst, a dhcpcd bug might let it die and be replaced "too often", but even then, if NetworkManager were reaping zombies properly, you wouldn't see this accumulation. To me, the only question is whether NetworkManager has separate code paths for dhclient versus dhcpcd, and only one of them is buggy (in which case using the other would work around the problem) or whether it uses equivalent code paths for both, and you will just see the same problem with dhclient zombies instead of dhcpcd zombies.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum