Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
How can a Python daemon tell OpenRC that startup failed?
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2  
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
GDH-gentoo
Veteran
Veteran


Joined: 20 Jul 2019
Posts: 1686
Location: South America

PostPosted: Tue Sep 10, 2024 10:12 pm    Post subject: Reply with quote

By the way, it doesn't look like package python-daemon by itself does any forking (i. e. "backgrounding") at all, so OpenRC's start-stop-daemon doesn't even seem usable with the program as it is now.
_________________
NeddySeagoon wrote:
I'm not a witch, I'm a retired electronics engineer :)
Ionen wrote:
As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though :)
Back to top
View user's profile Send private message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Wed Sep 11, 2024 5:35 am    Post subject: Reply with quote

GDH-gentoo wrote:
By the way, it doesn't look like package python-daemon by itself does any forking (i. e. "backgrounding") at all, so OpenRC's start-stop-daemon doesn't even seem usable with the program as it is now.

Well, it does work quite well, since over a year at least for my setup. With the only problem that I can't tell OpenRC if startup failed. I'm not a Unix guru, so it may be that a Service is something different than a Daemon or whatever. The scholars have to argue about this. What I see is that, using python-daemon, I can start a process that then runs detached from the shell it was started in. That is what I thought one would call a daemonized process … but I may be plain wrong.

However, I now learned that I don't even need this, because start-stop-daemon or supervise-daemon can background the process and take care of creating a pidfile etc.

But again: This is not what's it's about here – it's about how I can tell the init system that startup failed …

szatox wrote:
Well, yes, that's what supervisor does: it restarts the service when it crashes.
Is there some permanent error condition for which you'd rather it was flagged as failed and stayed down than have supervisor restart it?

It's a quite simple thing: When starting up, the program has to communicate with other hosts, get an answer, setup some classes and start a HTTP server. If any of this fails, it can't run. So it does not crash, it simply can't start up – and thus, it would also not be meaningful to retry it over and over again in this case.

As said: I'm just searching for a possibibility to inform the init system that startup failed …


Last edited by l3u on Wed Sep 11, 2024 5:40 am; edited 1 time in total
Back to top
View user's profile Send private message
flexibeast
Guru
Guru


Joined: 04 Apr 2022
Posts: 440
Location: Naarm/Melbourne, Australia

PostPosted: Wed Sep 11, 2024 6:38 am    Post subject: Reply with quote

Reading through this thread, i'm not sure i completely understand what you're wrestling with, so i'll just make some general comments.

Different service supervision and service management systems have different ways of handling things, such that there's no "one-size-fits-all" approach that daemons can provide.

Supervision systems based on a daemontools-style approach, such as s6, require the daemon to not fork, but to run in the foreground as a child process; this avoids having to deal with PID files and their issues, as the supervising process knows the status of its child. Then, in the context of s6 in particular, readiness notification is done via a file descriptor; refer to this draft wiki page for some specifics. Note, also, however, the s6-notifyoncheck documentation:

Quote:
s6-notifyoncheck is a chain-loading program meant to be used in run scripts, in a service that has been declared to honor readiness notification. It implements a policy of running a user-provided executable in the background that polls the service currently being launched, in order to check when it becomes ready. It feeds the result of this check into the s6 notification mechanism.

s6-notifyoncheck should only be used with daemons that can be polled from the outside to check readiness, and that do not implement readiness notification themselves.

On the other hand, systemd - which i don't use myself - takes a different approach, involving sd_notify(3) or a wrapper for it.

The upshot of these sort of differences is that your daemon will, at the very least, provide mechanisms to support different systems' different approaches. sshd(8), for example, has the `-D` option to prevent detaching and becoming a daemon; cf. e.g. this service file for the '66' system, which provides a declarative-style syntax built on top of s6. If you provide such mechanisms - possibly in addition to providing 'built-in' support for certain systems (e.g. for systemd, including providing a default `.service` file, or an openrc-run(8) script for OpenRC - others will be able to create the relevant service configuration for their systems as necessary.
Back to top
View user's profile Send private message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Wed Sep 11, 2024 7:36 am    Post subject: Reply with quote

Okay, I'll try again to explain the situation, as clear and as structured as possible.

I have a Python daemon/service/program/whatever you want to call it. It has to run in the background and I want to start it via OpenRC, on Gentoo, Artix and Devuan. I don't use other init systems like Systemd, S6, or runit. I would be completely happy if if worked with OpenRC.

The program can either background itself using python-daemon and it's DaemonContext, or I let OpenRC do this (either through start-stop-daemon or through supervise-daemon).

When starting up, the program has to do some HTTP communication, and it has to start a HTTP server. Both can fail, in which case the program can't run.
What I want to know is how I can tell OpenRC that the startup failed, and the process can't run.

The situation is the following:
  • If the program backgrounds itself, I can't exit with non-0 outside of the DaemonContext, because if I set up the HTTP server outside, it is not reachable anymore.
    As soon as I enter the DaemonContext, a pidfile is created, and OpenRC counts this as a successful startup – and doesn't care about the backgrounded process exiting with non-0 anymore.

  • If I let OpenRC background my process using start-stop-daemon and the process exists (no matter if it's 0 or non-0), OpenRC doesn't care at all, no matter where I do it.
    The process is simply not there anymore. Apparently, there's no way to tell the init system that the startup failed in this case at all.

  • If I let OpenRC background my process using supervise-daemon and the process exists with non-0, supervise-daemon assumes the process crashed and tries to restart it.
    But it shouldn't, as the process didn't crash but failed to start in the first place. Also, apparently no way to tell the init system.

  • Using OpenRC's --wait parameter (that seems to simply wait for a given time and checks if the process pointed to in the pidfile still exists) seems to be no option, as it's only present in Gentoo OpenRC, but not in Devuan/Devuan/dpkg OpenRC.

I hope I could explain the problem good enough?
Back to top
View user's profile Send private message
logrusx
Advocate
Advocate


Joined: 22 Feb 2018
Posts: 2400

PostPosted: Wed Sep 11, 2024 7:52 am    Post subject: Reply with quote

l3u wrote:
If I let OpenRC background my process using supervise-daemon and the process exists with non-0, supervise-daemon assumes the process crashed and tries to restart it.
But it shouldn't, as the process didn't crash but failed to start in the first place. Also, apparently no way to tell the init system.


I believe you're under a wrong impression. That is the desired behavior. When a service stops, it should be started again. That's why it's a called a service. It's intended to be available.

Once OpeRC or whatever init system starts it, it'll always try to restart it if it fails for some reason.

What you're trying to create is not a service. You might as well launch it manually.

What you can do is put it in you bashrc or something like that.

The only possible way I see to inform OpenRC it didn't start, and it's already pointed by other users but you're overlooking it because you're stuck on your wrong impression about how init systems and services works, is to do it during the time OpenRC is waiting for it to start. This way it'll fail startup and OpenRC and other init systems for that matter won't try to start it again. At least this is what I observed during the times when everybody used script based init systems.

But again, what you want to do does not fit the concept of a service. Your concept is wrong to begin with. You should look for other ways to do what you want your program to do.

You can start with why you want it to be a daemon/service/started by OpenRC and what its purpose is.

Best Regards,
Georgi
Back to top
View user's profile Send private message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Wed Sep 11, 2024 8:29 am    Post subject: Reply with quote

So a daemon/service is always supposed to start and never to fail to do so, e.g. due to wrong configuration? Like the port a HTTP server wants to run on is already in use or such?

The program should run all the time (it's a server and a controller), without a console, and without a user login. In the background. How else than starting it via an init system would I do this?
Back to top
View user's profile Send private message
logrusx
Advocate
Advocate


Joined: 22 Feb 2018
Posts: 2400

PostPosted: Wed Sep 11, 2024 8:36 am    Post subject: Reply with quote

l3u wrote:
So a daemon/service is always supposed to start and never to fail to do so, e.g. due to wrong configuration?


No. Otherwise you wouldn't see the red FAILED messages on startup when it doesn't. But you're mixing not necessarily compatible ideas here.

l3u wrote:
The program should run all the time (it's a server and a controller), without a console, and without a user login. In the background. How else than starting it via an init system would I do this?


Well you yourself said it may be unavailable, so I don't understand what you're trying to achieve here. As I suggested earlier, you should start with getting a clearer idea of what you want to achieve and what's possible.

If you want to inform OpenRC it failed, that must happen during the startup of the service. Demonization and going background are already ruled out. Someone suggested to postpone creation of the PID file. You might write a blocking script that checks itself whether it started and fails if it didn't. You have a lot of suggestions already to work with. Also you can check other examples of services, not necessarily written in python. Just check their rc files. You should have plenty of them on your system already.

I understand you might not want to disclose all of your work, it might be cover by NDA, trade secrets or whatever, but you should start with why you want it, how you want it, what the desired effects are and so on. Try formulating the problem as freely as your circumstances allow and I'm convinced the knowledgeable folks here will come up with at least a few ideas that suit your needs. Or they will at least help you change your perspective towards a solution.

Best Regards,
Georgi


Last edited by logrusx on Wed Sep 11, 2024 8:41 am; edited 1 time in total
Back to top
View user's profile Send private message
flexibeast
Guru
Guru


Joined: 04 Apr 2022
Posts: 440
Location: Naarm/Melbourne, Australia

PostPosted: Wed Sep 11, 2024 8:40 am    Post subject: Reply with quote

So what you want is to ensure that, if there's some issue that prevents the daemon from starting correctly, the supervisor process doesn't keep trying to start it indefinitely?

If that's the case, then it depends on the program being used to supervise the process; what should be done with failures is the supervisor's decision (which in turn will be configured by the sysadmin according what they want to happen given various factors[a]). So, for example, in the case of supervise-daemon(8):

Quote:
-m, --respawn-max count
Sets the maximum number of times a daemon will be respawned. If a daemon crashes more than this number of times, supervise-daemon will give up and exit. The default is 10 and 0 means unlimited.

If respawn-period is also set, more than respawn-max crashes must occur during respawn-period seconds to cause supervise-daemon to give up and exit.

A daemon should have a way of starting it in the foreground, with debug output (to the console or a file) enabled, so that, should the daemon be unable to start properly, there's a way to directly examine what the daemon is doing when it fails. (Ideally it would also have an option to manually check whether it's able to load the relevant config file, and whether there are any syntax errors.)

[a] For example: a service might be running in the context of regular but brief network connectivity failures. In that case, the sysadmin might not want the service to fail completely, but to try again after, say, 10 minutes; and only after it's failed to restart, say, three times, should the supervisor give up. If network failure were to result in immediate abandonment of attempts to restart the service, the sysadmin would forever be having to restart the service manually.
Back to top
View user's profile Send private message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Wed Sep 11, 2024 9:36 am    Post subject: Reply with quote

logrusx wrote:
Well you yourself said it may be unavailable, so I don't understand what you're trying to achieve here

It will be available – as soon as the startup succeeded! It's only the startup that may fails. Once it runs, it will run.
logrusx wrote:
I understand you might not want to disclose all of your work

Hey, I posted the GitLab repo of the whole thing, "master" is what I did until now, "test" is with letting OpenRC do the backgrounding – that's as disclosed as it gets, no?! ;-) I just added a minimal example so that one would not have to download and setup the whole thing to test it (or check out the problem).

Just to explain it a bit better: This is about charging an ev with pv surplus. The process has to query the inverter and the charger repeatedly to check how much we have, tweak the charger settings, start and stop the charging and monitoring if a car is connected at all. And additionally, there's a HTTP server, so that one can see what's happening and change settings, so that one can communicate with the process run-time. That's it. I thought doing this in Python would be a good idea, so that anybody interested in it (but maybe not too fit with compiling stuff etc.) could simply download the thing and run it to check it out. On a Raspberry Pi, or wherever. Also, all this does not have to be too optimized (it functions very well btw. ;-).

It seems like the only way to achieve what I want is to do a "test" setup, inside the start script, but not inside the (forked) daemon context. I think I'll have to check if I can setup my backend communication classes (if they are properly configured, if the remote hosts are reachable) and if I can start a HTTP server for the given IP and port. And if that succeeds, I start the real thing inside the DaemonContext, which will cause the fork.
Back to top
View user's profile Send private message
flexibeast
Guru
Guru


Joined: 04 Apr 2022
Posts: 440
Location: Naarm/Melbourne, Australia

PostPosted: Wed Sep 11, 2024 10:08 am    Post subject: Reply with quote

l3u wrote:
if that succeeds, I start the real thing inside the DaemonContext, which will cause the fork.

You continue to write as though every OpenRC system expects a forking daemon. This is incorrect. OpenRC can use s6-supervise - refer to the openrc-run(8) man page:

Quote:
supervisor
Supervisor to use to monitor this daemon. If this is unset or invalid, start-stop-daemon will be used. Currently, we support s6 from skarnet software, and supervise-daemon which is a light-weight supervisor internal to OpenRC. To use s6, set supervisor=s6. or set supervisor=supervise-daemon to use supervise-daemon.

Using s6-supervise, as i wrote above, requires that the daemon not fork. And there are definitely OpenRC users who prefer to use s6 as the OpenRC supervisor. You need to account for this.
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3420

PostPosted: Wed Sep 11, 2024 10:43 am    Post subject: Reply with quote

Alright, slow down everyone....

Basically it looks like you should make use of "inactive" state. It's not the same as "Startup failed". Daemons are not supposed to fail, and if they do, they're buggy, and the best init system can do is either restart it or flag as crashed.
Your daemon which needs some setup dependent on external factors should enter inactive state, then perform whatever setup necessary, possibly waiting for some condition and retrying, and once it's ready, turn "started".

NetworkManager is an example of such service, it waits for interfaces to have IPs assigned, which may depend on a cable being plugged in and external dhcp server running.
Code:
start() {
        # If we are re-called by a dispatcher event, we want to mark the service
        # as started without starting the daemon again
        yesno "${IN_BACKGROUND}" && return 0

        [ -z "${INACTIVE_TIMEOUT}" ] && INACTIVE_TIMEOUT="1"

        ebegin "Starting NetworkManager"
        start-stop-daemon --start --quiet --pidfile /run/NetworkManager/NetworkManager.pid \
                --exec /usr/sbin/NetworkManager -- --pid-file /run/NetworkManager/NetworkManager.pid
        local _retval=$?
        eend "${_retval}"
        if [ "x${_retval}" = 'x0' ] && ! nm-online -t "${INACTIVE_TIMEOUT}"; then
                einfo "Marking NetworkManager as inactive. It will automatically be marked"
                einfo "as started after a network connection has been established."
                mark_service_inactive
        fi
        return "${_retval}"
}

This looks like a good start, though I currently don't know how it signals changing state from inactive to started afterwards (it does).
_________________
Make Computing Fun Again
Back to top
View user's profile Send private message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Wed Sep 11, 2024 11:23 am    Post subject: Reply with quote

I got it :-)

The trick was to make the HTTP server run, although it wasn't instantiated inside the DaemonContext. This was actually a Python-specific problem. I had to add a files_preserve parameter to the context, using the server's descriptor (inspired by an Python HTTP server DaemonContext specific question on Stack Overflow). Now I can setup my backend classes and also the HTTP server outside of the DaemonContext, and the HTTP server is still functional when started later inside of the DaemonContext.

This now makes it possible to setup my stuff, including the HTTP server, before backgrounding. When anything goes wrong, OpenRC sees the non-0 exit code and reports the startup as failed.

See the updated startup script at https://gitlab.com/l3u/go-e-pvsd/-/blob/6978de6c9fa46a90198d78a10d182956f3b93e42/go-e-pvsd

Still, one can choose to not daemonize (by simply not passing the -d option). This will then cause the process to run in the foreground, for testing and/or debugging purposes. Also possibly making users of other init systems or ways to fire up the daemon happy I hope.

That was a hard one though. Thanks for all input :-)
Back to top
View user's profile Send private message
Ralphred
l33t
l33t


Joined: 31 Dec 2013
Posts: 652

PostPosted: Wed Sep 11, 2024 12:37 pm    Post subject: Reply with quote

l3u wrote:
And if that succeeds, I start the real thing inside the DaemonContext, which will cause the fork.

You need to stop the unconditional exit of the parent after the fork and point it towards some "monitoring" code. Some very quick and dirty PoC "monitoring"
Code:
#!/usr/bin/env python
import os
from sys import exit,argv
from signal import signal,SIGUSR1,SIGUSR2
from psutil import Process
from time import sleep
print("running as %s"%os.getpid())

def clean_exit(sig,frame):
   print("clean exit")
   exit(0)

def failed_exit(sig,frame):
   print("failed exit")
   exit(1)

def main(pid):
   signal(SIGUSR1, clean_exit)
   signal(SIGUSR2, failed_exit)
   timeout=40
   while timeout>0:
      try:
         if Process(int(pid)).is_running():
            pass
         else:
            failed_exit(1,1)
      except:
         print("process %s is not running"%pid)
         failed_exit(1,1)
      sleep(1)
      timeout-=1
   failed_exit(1,1)

main(argv[1])

So it wait up to 60 seconds for the (prospective) child to send SIGUSR1|SIGUSR2 to it on success|failure, and times out with exit(1) if it doesn't.
It'd be cleaner to use signal.sigtimedwait(), but it's just a PoC.
Back to top
View user's profile Send private message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Wed Sep 11, 2024 12:51 pm    Post subject: Reply with quote

But I think, now, after the tweaks I luckily found, my start script does not have to wait anymore – as soon as the setup (in non-forked state) succeeded, the daemon will be able to run. Now I see this before forking – why would I have to wait?

I think this would be a feasible solution for the state before, where I thought I could not check for a succesful setup before the fork?
Back to top
View user's profile Send private message
Ralphred
l33t
l33t


Joined: 31 Dec 2013
Posts: 652

PostPosted: Wed Sep 11, 2024 1:16 pm    Post subject: Reply with quote

l3u wrote:
I think this would be a feasible solution for the state before, where I thought I could not check for a succesful setup before the fork?
I have a "generic python daemon" class laying around, so you piqued my interest in "fixing" the lazy if os.fork():exit(0) daemonisation.

I started to reply and throw in the PoC code before you posted that you'd fixed it, but got waylaid by phone calls before I hit the Submit button.
Back to top
View user's profile Send private message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Wed Sep 11, 2024 1:40 pm    Post subject: Reply with quote

Ah, okay. Thanks for sharing your solution however, I'm pretty sure somebody could need this some time :-)
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum