View previous topic :: View next topic |
Author |
Message |
l3u Advocate
Joined: 26 Jan 2005 Posts: 2610 Location: Konradsreuth (Germany)
|
Posted: Sun Sep 08, 2024 6:48 am Post subject: How can a Python daemon tell OpenRC that startup failed? |
|
|
Hi all,
I wrote a Python daemon that I start using an OpenRC init script. The starting and stopping works fine, however, there's one problem: The daemon startup may fail, and OpenRC ignores that and always assumes the daemon started correctly.
I use a start script that uses a helper class called ProcressManager that does the actual startup. Stripped down, it looks like this:
Code: | import daemon
import daemon.pidfile
from ProcessManager.ProcessManager import ProcessManager
def setupProcessManager(args):
if not processManager.setup():
sys.exit(1)
signal.signal(signal.SIGTERM, processManager.terminate)
signal.signal(signal.SIGINT, processManager.terminate)
processManager.start()
processManager = ProcessManager(args)
with daemon.DaemonContext(pidfile = daemon.pidfile.PIDLockFile(args.p)):
setupProcessManager(args)
processManager.finished.wait() |
The interesting part is that
Code: | if not processManager.setup(): |
(The whole code can be found on GitLab, with the init script and the startup script)
No matter if I do sys.exit(1) in there or raise some expection: OpenRC always thinks everything is fine.
So: How can I tell OpenRC that my daemon could not start up?
Thanks for all help!
Last edited by l3u on Sun Sep 08, 2024 7:55 am; edited 1 time in total |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2392
|
Posted: Sun Sep 08, 2024 6:56 am Post subject: |
|
|
Have you checked your program actually returns something different than zero?
Best Regards,
Georgi |
|
Back to top |
|
|
l3u Advocate
Joined: 26 Jan 2005 Posts: 2610 Location: Konradsreuth (Germany)
|
Posted: Sun Sep 08, 2024 7:49 am Post subject: |
|
|
Yes, of course – you can start the daemon either daemonized or simply running in foregound, for debugging.
If you run it in foreground, the sys.exit(1) of course makes the program exit with non-0 ($? is 1). I assume that the exit code is also non-0 if it's started daemonized, no?! |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2392
|
Posted: Sun Sep 08, 2024 9:02 am Post subject: |
|
|
l3u wrote: | I assume that the exit code is also non-0 if it's started daemonized, no?! |
I'm not a pythoneer, but it's hard to believe OpenRC would ignore the return code. That's why I'm asking if you have verified your program returns non-zero value when exiting due to failure or you're just thinking it does.
Best Regards,
Georgi |
|
Back to top |
|
|
grknight Retired Dev
Joined: 20 Feb 2015 Posts: 1902
|
Posted: Sun Sep 08, 2024 1:18 pm Post subject: |
|
|
OpenRC, in this script, is relying on the default start-stop-daemon process (/lib/rc/sh/start-stop-daemon.sh) as there is no supervisor nor start function defined.
OpenRC's start-stop-daemon calls something started when the pidfile is created and the pid exists.
First, do not create the pidfile in the daemon until the daemon is ready.
If pid creation must come first, try setting SSD_STARTWAIT=1000 (ms) in the script to delay the pid check.
Alternatively consider supervisor="supervise-daemon" which does not fork the process. |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22608
|
Posted: Sun Sep 08, 2024 1:37 pm Post subject: |
|
|
ProcessManager does not appear to be part of the standard library, so this is not a Minimal Reproducible Example.
Normally, when a process converts itself to a daemon, it will fork, the parent will exit 0, and the child will run to do the real work. If this process converts to a daemon before it detects a problem, then yes, the supervisor will consider it a successful startup. For that reason, I like grknight's suggestion. |
|
Back to top |
|
|
l3u Advocate
Joined: 26 Jan 2005 Posts: 2610 Location: Konradsreuth (Germany)
|
Posted: Mon Sep 09, 2024 6:51 pm Post subject: |
|
|
No, that ProcessManager class is a part of the daemon (cf. the linked gitlab repo).
Seems like the exit code gets lost through the "daemon context". I guess I have to dive a bit deeper in how to write a proper Python daemon …
So the solution would be to setup the daemon in foreground and maybe exit with non-0 there, and somehow daemonize it afterwards, right? |
|
Back to top |
|
|
l3u Advocate
Joined: 26 Jan 2005 Posts: 2610 Location: Konradsreuth (Germany)
|
Posted: Tue Sep 10, 2024 2:55 pm Post subject: |
|
|
As I could not really work it out, I posted a question on Stack Overflow, also containing a complete minimal example.
There, one guy asked why I care about daemonizing my stuff at all if I use OpenRC to supervise it … so: Do I have to?!
I never wrote a daemon before, and esp. not a Python one. So can OpenRC do the work for me? And will this work on other distros, too? The daemon currently runs on Devuan/OpenRC, and even though it's OpenRC, it's not fully compatible with Gentoo and I had to tweak the init script a bit …
Thanks for all help on this! |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22608
|
Posted: Tue Sep 10, 2024 3:25 pm Post subject: |
|
|
l3u wrote: | No, that ProcessManager class is a part of the daemon (cf. the linked gitlab repo). | GitLab's browser fails to render due to a JavaScript error. l3u wrote: | Seems like the exit code gets lost through the "daemon context". I guess I have to dive a bit deeper in how to write a proper Python daemon … | By definition, a daemon exits 0 and leaves a child running. The child's exit status is only visible to init, which neither knows nor cares what it means. l3u wrote: | So the solution would be to setup the daemon in foreground and maybe exit with non-0 there, and somehow daemonize it afterwards, right? | Not quite. Daemons are not in the foreground. That's why daemonize moves the caller to the background. You could run your initialization code before moving to daemon status, or you could just remain in the foreground under supervision of a process that knows not to block other activity while waiting for you to exit (which you would only do when an administrative process tells you to halt completely). l3u wrote: | There, one guy asked why I care about daemonizing my stuff at all if I use OpenRC to supervise it … so: Do I have to?! | No. That is why grknight told you to use supervise-daemon. l3u wrote: | I never wrote a daemon before, and esp. not a Python one. So can OpenRC do the work for me? | As I read the manual page for supervise-daemon, yes.
l3u wrote: | And will this work on other distros, too? | If they use openrc, or have an equivalent feature, yes. |
|
Back to top |
|
|
l3u Advocate
Joined: 26 Jan 2005 Posts: 2610 Location: Konradsreuth (Germany)
|
Posted: Tue Sep 10, 2024 4:11 pm Post subject: |
|
|
Hu wrote: | GitLab's browser fails to render due to a JavaScript error. |
Using the latest stable Firefox here, everything is fine with GitLab …
Maybe I should ask a bit more generically … if one writes a daemon nowadays (may it be implemented in Python or not) – is it expected to background itself, or is this an init system's task?
The only other Python daemon I know is Radicale, and it seems like the backgrounding and PID file management is up to start-stop-daemon there:
Code: | start() {
ebegin "Starting radicale"
start-stop-daemon --start --quiet --background \
--user radicale \
--umask 0027 \
--stderr-logger /usr/bin/logger \
--pidfile ${PIDFILE} --make-pidfile \
--exec /usr/bin/radicale
eend $?
}
stop() {
ebegin "Stopping radicale"
start-stop-daemon --stop --quiet \
--pidfile ${PIDFILE}
eend $?
} |
|
|
Back to top |
|
|
grknight Retired Dev
Joined: 20 Feb 2015 Posts: 1902
|
Posted: Tue Sep 10, 2024 4:38 pm Post subject: |
|
|
What Hu is basically suggesting is: in the startup script, move (and change if needed) Code: | # Setup the process manager
if not processManager.setup():
raise | to just before
The basic configuration should come before any daemon forking call. If that configuration fails, it can be communicated to the calling supervisor. |
|
Back to top |
|
|
GDH-gentoo Veteran
Joined: 20 Jul 2019 Posts: 1677 Location: South America
|
Posted: Tue Sep 10, 2024 5:03 pm Post subject: |
|
|
l3u wrote: | Maybe I should ask a bit more generically … if one writes a daemon nowadays (may it be implemented in Python or not) – is it expected to background itself, or is this an init system's task? |
Let's see if I can clarify.
The way that your OpenRC service script is written means that service startup will be delegated to OpenRC's start-stop-daemon program. As far as I can tell, start-stop-daemon does, in fact, consider the exit code of the program it is told to run (/usr/bin/go-e-pvsd here), and does 'forward' to OpenRC the success / failure state implied by it.
Now. I can't follow the Python code, but, from a design point of view, what go-e-pvsd with the -d option is supposed to do, is fork a child process —which would be the long-running process that does the actual work that the daemon is expected to do— and exit. Therefore, go-e-pvsd -d itself should be a short-lived process. In fact, OpenRC is a serial service manager, so until go-e-pvsd exits, if service startup is happening as part of entering an OpenRC (named) runlevel, then all services scheduled after go-e-pvsd will be delayed. So it also can't take too long to exit.
Therefore, the problem here is that, if you actually want OpenRC to consider service startup a failure, rather than continuing with other services and leaving it up to you to discover with rc-service that the child process isn't actually running, then you need to run enough startup code in go-e-pvsd to determine success or failure state before forking the child, so that it can exit without forking and with a suitable exit code in the failure case. Subject to the constrait that it can't take long to do that.
Now compare this to whatever your Python code does Note that any failure after forking won't be detected by OpenRC.
OpenRC's supervise-daemon is a different beast, it is a process supervisor.
By the way, this description applies to upstream OpenRC, which is what Gentoo packages. I believe that Debian and Devuan ship a modified version. And I don't know what Alpine or other OpenRC-based distributions ship exactly. _________________
NeddySeagoon wrote: | I'm not a witch, I'm a retired electronics engineer |
Ionen wrote: | As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though |
Last edited by GDH-gentoo on Tue Sep 10, 2024 6:29 pm; edited 1 time in total |
|
Back to top |
|
|
l3u Advocate
Joined: 26 Jan 2005 Posts: 2610 Location: Konradsreuth (Germany)
|
Posted: Tue Sep 10, 2024 5:15 pm Post subject: |
|
|
Okay I now tried my luck with supervise-daemon.
I completely removed the daemonizing from my daemon, letting it always run in foreground.
The init script I use is now
Code: | depend() {
need net
use logger
}
supervisor="supervise-daemon"
command="/usr/bin/go-e-pvsd"
pidfile="/run/${RC_SVCNAME}.pid"
command_args="-s"
command_args_foreground="" |
That can start and stop the daemon as expected (which is a nice fact, because apparently, I actually can cheap out that messing with daemonizing :-)
But – I still can't tell OpenRC that the startup failed. If I exit with code 1, supervise-daemon simply says "/usr/bin/go-e-pvsd, pid XXX, exited with return code 1" and tries to restart it over and over again …
Same for the non-supervise-daemon variant:
Code: | depend() {
need net
use logger
}
command="/usr/bin/go-e-pvsd"
pidfile="/run/${RC_SVCNAME}.pid"
command_args="-s"
start() {
ebegin "Starting ${RC_SVCNAME}"
start-stop-daemon --start --background \
--make-pidfile --pidfile ${pidfile} \
--exec ${command} -- ${command_args}
eend $?
}
stop() {
ebegin "Stopping ${RC_SVCNAME}"
start-stop-daemon --stop --pidfile ${pidfile}
eend $?
} |
with the only difference that in this case, no restart is attempted. |
|
Back to top |
|
|
GDH-gentoo Veteran
Joined: 20 Jul 2019 Posts: 1677 Location: South America
|
Posted: Tue Sep 10, 2024 5:33 pm Post subject: |
|
|
l3u wrote: | But – I still can't tell OpenRC that the startup failed. If I exit with code 1, supervise-daemon simply says "/usr/bin/go-e-pvsd, pid XXX, exited with return code 1" and tries to restart it over and over again … |
Yeah, that's what a process supervisor does. _________________
NeddySeagoon wrote: | I'm not a witch, I'm a retired electronics engineer |
Ionen wrote: | As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though |
|
|
Back to top |
|
|
l3u Advocate
Joined: 26 Jan 2005 Posts: 2610 Location: Konradsreuth (Germany)
|
Posted: Tue Sep 10, 2024 5:45 pm Post subject: |
|
|
So I simply can't tell OpenRC that my startup failed, because I can't exit with non-0 before the fork … |
|
Back to top |
|
|
GDH-gentoo Veteran
Joined: 20 Jul 2019 Posts: 1677 Location: South America
|
Posted: Tue Sep 10, 2024 5:49 pm Post subject: |
|
|
l3u wrote: | So I simply can't tell OpenRC that my startup failed, because I can't exit with non-0 before the fork … |
Can't you modify the code so that it does? I wish I could make a suggestion, but I can't follow that Python code _________________
NeddySeagoon wrote: | I'm not a witch, I'm a retired electronics engineer |
Ionen wrote: | As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though |
|
|
Back to top |
|
|
l3u Advocate
Joined: 26 Jan 2005 Posts: 2610 Location: Konradsreuth (Germany)
|
Posted: Tue Sep 10, 2024 5:56 pm Post subject: |
|
|
I maybe somehow could have as long as the code forked itself using python-daemon (I don't know how, I could not get it to work … cf. my Stack Overflow post). But as soon as I let OpenRC fork the process, there's no way to exit before the fork I think, no?! |
|
Back to top |
|
|
GDH-gentoo Veteran
Joined: 20 Jul 2019 Posts: 1677 Location: South America
|
Posted: Tue Sep 10, 2024 6:12 pm Post subject: |
|
|
l3u wrote: | But as soon as I let OpenRC fork the process, there's no way to exit before the fork I think, no?! |
With your original service script, OpenRC doesn't do any forking, go-e-pvsd is expected to, at least in the success case. And it should exit (rather quickly) with an appropriate exit code in any case. _________________
NeddySeagoon wrote: | I'm not a witch, I'm a retired electronics engineer |
Ionen wrote: | As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though |
Last edited by GDH-gentoo on Tue Sep 10, 2024 6:15 pm; edited 1 time in total |
|
Back to top |
|
|
grknight Retired Dev
Joined: 20 Feb 2015 Posts: 1902
|
Posted: Tue Sep 10, 2024 6:12 pm Post subject: |
|
|
l3u wrote: | there's no way to exit before the fork I think, no?! |
Sure there is.. do your processManager.setup() before the fork (the with daemon statement).
processManager.setup is where you are trying to bail from or am I wrong? |
|
Back to top |
|
|
l3u Advocate
Joined: 26 Jan 2005 Posts: 2610 Location: Konradsreuth (Germany)
|
Posted: Tue Sep 10, 2024 6:42 pm Post subject: |
|
|
Yeah,and exactly that is the point where it is not working (cf. the SO post). If I setup the HTTP server before forking, it's not accessible. And the HTTP server startup is one of the preconditions that should lead to startup failure if it fails. But as soon as I enter the daemon context, OpenRC thinks my startup was successful.
Just to post it also here: This is a minimal compatible example:
Code: | #!/usr/bin/env python3
import sys
import signal
import argparse
import daemon
import daemon.pidfile
from syslog import syslog
import threading
from http.server import HTTPServer, BaseHTTPRequestHandler
from time import strftime
parser = argparse.ArgumentParser()
parser.add_argument("-d", action = "store_true", help = "daemonize")
args = parser.parse_args()
class RequestHandler(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.end_headers()
self.wfile.write(b"I'm here")
class ProcessManager:
def __init__(self):
self.timer = None
self.server = None
self.signalCatched = False
self.finished = threading.Event()
def setup(self) -> bool:
syslog("Setting up different stuff")
# All kind of stuff that could fail, returning False then
syslog("Setting up the HTTP server")
try:
self.server = HTTPServer(("127.0.0.1", 8000), RequestHandler)
except Exception as error:
syslog("Failed to setup the HTTP server")
return False
return True
def start(self):
thread = threading.Thread(target = self.server.serve_forever)
thread.deamon = True
thread.start()
self.scheduleNextRun()
def scheduleNextRun(self):
if self.signalCatched:
return
syslog("Daemon running at {}".format(strftime("%Y-%m-%d %H:%M:%S")))
self.timer = threading.Timer(3, self.scheduleNextRun)
self.timer.start()
def terminate(self, signum, frame):
syslog("Catched signal, will now terminate")
self.signalCatched = True
if self.timer:
self.timer.cancel()
self.server.shutdown()
self.finished.set()
def setupProcessManager():
if not processManager.setup():
sys.exit(1)
signal.signal(signal.SIGTERM, processManager.terminate)
signal.signal(signal.SIGINT, processManager.terminate)
processManager.start()
processManager = ProcessManager()
if args.d:
with daemon.DaemonContext(pidfile = daemon.pidfile.PIDLockFile("/run/test.pid")):
syslog("Starting up in daemon mode")
setupProcessManager()
processManager.finished.wait()
else:
syslog("Starting up in foreground mode")
setupProcessManager() |
|
|
Back to top |
|
|
GDH-gentoo Veteran
Joined: 20 Jul 2019 Posts: 1677 Location: South America
|
Posted: Tue Sep 10, 2024 6:53 pm Post subject: |
|
|
l3u wrote: | Just to post it also here: This is a minimal compatible example: |
I think that what grknight is suggesting is something like this (not tested):
Code: | #!/usr/bin/env python3
# ...
def runProcessManager():
signal.signal(signal.SIGTERM, processManager.terminate)
signal.signal(signal.SIGINT, processManager.terminate)
processManager.start()
processManager = ProcessManager()
if not processManager.setup():
sys.exit(1)
if args.d:
# Replace with forking code, make sure that the parent exits with code 0.
else:
# Non-forking code
syslog("Starting up in foreground mode")
runProcessManager() |
_________________
NeddySeagoon wrote: | I'm not a witch, I'm a retired electronics engineer |
Ionen wrote: | As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though |
|
|
Back to top |
|
|
l3u Advocate
Joined: 26 Jan 2005 Posts: 2610 Location: Konradsreuth (Germany)
|
Posted: Tue Sep 10, 2024 7:06 pm Post subject: |
|
|
As said, this does not work. If I setup the HTTP server outside of the daemon context, it's not accessible. Also, the signal connections have to be made inside the daemon context to work. |
|
Back to top |
|
|
grknight Retired Dev
Joined: 20 Feb 2015 Posts: 1902
|
Posted: Tue Sep 10, 2024 7:37 pm Post subject: |
|
|
Since you don't like supervise-daemon, my original response includes one you haven't commended on:
grknight wrote: | If pid creation must come first, try setting SSD_STARTWAIT=1000 (ms) in the script to delay the pid check. | (the OpenRC init script)
This may also be issued like start_stop_daemon_args="--wait 1000" supervise_daemon_args=""
Does not work with Debian's start-stop-daemon program |
|
Back to top |
|
|
l3u Advocate
Joined: 26 Jan 2005 Posts: 2610 Location: Konradsreuth (Germany)
|
Posted: Tue Sep 10, 2024 8:03 pm Post subject: |
|
|
I do like supervise-daemon a lot, but it did not make any difference to use it …
I can't use that wait feature either if it's Gentoo-specific :-( |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3410
|
Posted: Tue Sep 10, 2024 9:35 pm Post subject: |
|
|
Quote: | But – I still can't tell OpenRC that the startup failed. If I exit with code 1, supervise-daemon simply says "/usr/bin/go-e-pvsd, pid XXX, exited with return code 1" and tries to restart it over and over again … | Well, yes, that's what supervisor does: it restarts the service when it crashes.
Is there some permanent error condition for which you'd rather it was flagged as failed and stayed down than have supervisor restart it? _________________ Make Computing Fun Again |
|
Back to top |
|
|
|