How can a Python daemon tell OpenRC that startup failed?

GDH-gentoo · Posted: Tue Sep 10, 2024 10:12 pm Post subject:

By the way, it doesn't look like package python-daemon by itself does any forking (i. e. "backgrounding") at all, so OpenRC's start-stop-daemon doesn't even seem usable with the program as it is now.
_________________

l3u · Posted: Wed Sep 11, 2024 5:35 am Post subject:

flexibeast · Posted: Wed Sep 11, 2024 6:38 am Post subject:

Reading through this thread, i'm not sure i completely understand what you're wrestling with, so i'll just make some general comments.

Different service supervision and service management systems have different ways of handling things, such that there's no "one-size-fits-all" approach that daemons can provide.

Supervision systems based on a daemontools-style approach, such as s6, require the daemon to not fork, but to run in the foreground as a child process; this avoids having to deal with PID files and their issues, as the supervising process knows the status of its child. Then, in the context of s6 in particular, readiness notification is done via a file descriptor; refer to this draft wiki page for some specifics. Note, also, however, the s6-notifyoncheck documentation:

l3u · Posted: Wed Sep 11, 2024 7:36 am Post subject:

Okay, I'll try again to explain the situation, as clear and as structured as possible.

I have a Python daemon/service/program/whatever you want to call it. It has to run in the background and I want to start it via OpenRC, on Gentoo, Artix and Devuan. I don't use other init systems like Systemd, S6, or runit. I would be completely happy if if worked with OpenRC.

The program can either background itself using python-daemon and it's DaemonContext, or I let OpenRC do this (either through start-stop-daemon or through supervise-daemon).

When starting up, the program has to do some HTTP communication, and it has to start a HTTP server. Both can fail, in which case the program can't run.
What I want to know is how I can tell OpenRC that the startup failed, and the process can't run.

The situation is the following:

If the program backgrounds itself, I can't exit with non-0 outside of the DaemonContext, because if I set up the HTTP server outside, it is not reachable anymore.
As soon as I enter the DaemonContext, a pidfile is created, and OpenRC counts this as a successful startup – and doesn't care about the backgrounded process exiting with non-0 anymore.
If I let OpenRC background my process using start-stop-daemon and the process exists (no matter if it's 0 or non-0), OpenRC doesn't care at all, no matter where I do it.
The process is simply not there anymore. Apparently, there's no way to tell the init system that the startup failed in this case at all.
If I let OpenRC background my process using supervise-daemon and the process exists with non-0, supervise-daemon assumes the process crashed and tries to restart it.
But it shouldn't, as the process didn't crash but failed to start in the first place. Also, apparently no way to tell the init system.
Using OpenRC's --wait parameter (that seems to simply wait for a given time and checks if the process pointed to in the pidfile still exists) seems to be no option, as it's only present in Gentoo OpenRC, but not in Devuan/Devuan/dpkg OpenRC.

I hope I could explain the problem good enough?

logrusx · Advocate Joined: 22 Feb 2018 Posts: 2199

l3u · Posted: Wed Sep 11, 2024 8:29 am Post subject:

So a daemon/service is always supposed to start and never to fail to do so, e.g. due to wrong configuration? Like the port a HTTP server wants to run on is already in use or such?

The program should run all the time (it's a server and a controller), without a console, and without a user login. In the background. How else than starting it via an init system would I do this?

logrusx · Advocate Joined: 22 Feb 2018 Posts: 2199

flexibeast · Posted: Wed Sep 11, 2024 8:40 am Post subject:

So what you want is to ensure that, if there's some issue that prevents the daemon from starting correctly, the supervisor process doesn't keep trying to start it indefinitely?

If that's the case, then it depends on the program being used to supervise the process; what should be done with failures is the supervisor's decision (which in turn will be configured by the sysadmin according what they want to happen given various factors[a]). So, for example, in the case of supervise-daemon(8):

l3u · Posted: Wed Sep 11, 2024 9:36 am Post subject:

flexibeast · Posted: Wed Sep 11, 2024 10:08 am Post subject:

szatox · Advocate Joined: 27 Aug 2013 Posts: 3340

Alright, slow down everyone....

Basically it looks like you should make use of "inactive" state. It's not the same as "Startup failed". Daemons are not supposed to fail, and if they do, they're buggy, and the best init system can do is either restart it or flag as crashed.
Your daemon which needs some setup dependent on external factors should enter inactive state, then perform whatever setup necessary, possibly waiting for some condition and retrying, and once it's ready, turn "started".

NetworkManager is an example of such service, it waits for interfaces to have IPs assigned, which may depend on a cable being plugged in and external dhcp server running.

l3u · Posted: Wed Sep 11, 2024 11:23 am Post subject:

I got it :-)

The trick was to make the HTTP server run, although it wasn't instantiated inside the DaemonContext. This was actually a Python-specific problem. I had to add a files_preserve parameter to the context, using the server's descriptor (inspired by an Python HTTP server DaemonContext specific question on Stack Overflow). Now I can setup my backend classes and also the HTTP server outside of the DaemonContext, and the HTTP server is still functional when started later inside of the DaemonContext.

This now makes it possible to setup my stuff, including the HTTP server, before backgrounding. When anything goes wrong, OpenRC sees the non-0 exit code and reports the startup as failed.

See the updated startup script at https://gitlab.com/l3u/go-e-pvsd/-/blob/6978de6c9fa46a90198d78a10d182956f3b93e42/go-e-pvsd

Still, one can choose to not daemonize (by simply not passing the -d option). This will then cause the process to run in the foreground, for testing and/or debugging purposes. Also possibly making users of other init systems or ways to fire up the daemon happy I hope.

That was a hard one though. Thanks for all input :-)

Ralphred · Guru Joined: 31 Dec 2013 Posts: 566

l3u · Posted: Wed Sep 11, 2024 12:51 pm Post subject:

But I think, now, after the tweaks I luckily found, my start script does not have to wait anymore – as soon as the setup (in non-forked state) succeeded, the daemon will be able to run. Now I see this before forking – why would I have to wait?

I think this would be a feasible solution for the state before, where I thought I could not check for a succesful setup before the fork?

Ralphred · Guru Joined: 31 Dec 2013 Posts: 566

l3u · Posted: Wed Sep 11, 2024 1:40 pm Post subject:

Ah, okay. Thanks for sharing your solution however, I'm pretty sure somebody could need this some time :-)