Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
How can a Python daemon tell OpenRC that startup failed?
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Sun Sep 08, 2024 6:48 am    Post subject: How can a Python daemon tell OpenRC that startup failed? Reply with quote

Hi all,

I wrote a Python daemon that I start using an OpenRC init script. The starting and stopping works fine, however, there's one problem: The daemon startup may fail, and OpenRC ignores that and always assumes the daemon started correctly.

I use a start script that uses a helper class called ProcressManager that does the actual startup. Stripped down, it looks like this:
Code:
import daemon
import daemon.pidfile
from ProcessManager.ProcessManager import ProcessManager

def setupProcessManager(args):
    if not processManager.setup():
        sys.exit(1)

    signal.signal(signal.SIGTERM, processManager.terminate)
    signal.signal(signal.SIGINT, processManager.terminate)

    processManager.start()

processManager = ProcessManager(args)

with daemon.DaemonContext(pidfile = daemon.pidfile.PIDLockFile(args.p)):
    setupProcessManager(args)
    processManager.finished.wait()

The interesting part is that
Code:
if not processManager.setup():

(The whole code can be found on GitLab, with the init script and the startup script)

No matter if I do sys.exit(1) in there or raise some expection: OpenRC always thinks everything is fine.

So: How can I tell OpenRC that my daemon could not start up?

Thanks for all help!


Last edited by l3u on Sun Sep 08, 2024 7:55 am; edited 1 time in total
Back to top
View user's profile Send private message
logrusx
Advocate
Advocate


Joined: 22 Feb 2018
Posts: 2420

PostPosted: Sun Sep 08, 2024 6:56 am    Post subject: Reply with quote

Have you checked your program actually returns something different than zero?

Best Regards,
Georgi
Back to top
View user's profile Send private message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Sun Sep 08, 2024 7:49 am    Post subject: Reply with quote

Yes, of course – you can start the daemon either daemonized or simply running in foregound, for debugging.

If you run it in foreground, the sys.exit(1) of course makes the program exit with non-0 ($? is 1). I assume that the exit code is also non-0 if it's started daemonized, no?!
Back to top
View user's profile Send private message
logrusx
Advocate
Advocate


Joined: 22 Feb 2018
Posts: 2420

PostPosted: Sun Sep 08, 2024 9:02 am    Post subject: Reply with quote

l3u wrote:
I assume that the exit code is also non-0 if it's started daemonized, no?!


I'm not a pythoneer, but it's hard to believe OpenRC would ignore the return code. That's why I'm asking if you have verified your program returns non-zero value when exiting due to failure or you're just thinking it does.

Best Regards,
Georgi
Back to top
View user's profile Send private message
grknight
Retired Dev
Retired Dev


Joined: 20 Feb 2015
Posts: 1919

PostPosted: Sun Sep 08, 2024 1:18 pm    Post subject: Reply with quote

OpenRC, in this script, is relying on the default start-stop-daemon process (/lib/rc/sh/start-stop-daemon.sh) as there is no supervisor nor start function defined.
OpenRC's start-stop-daemon calls something started when the pidfile is created and the pid exists.

First, do not create the pidfile in the daemon until the daemon is ready.

If pid creation must come first, try setting SSD_STARTWAIT=1000 (ms) in the script to delay the pid check.

Alternatively consider supervisor="supervise-daemon" which does not fork the process.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22650

PostPosted: Sun Sep 08, 2024 1:37 pm    Post subject: Reply with quote

ProcessManager does not appear to be part of the standard library, so this is not a Minimal Reproducible Example.

Normally, when a process converts itself to a daemon, it will fork, the parent will exit 0, and the child will run to do the real work. If this process converts to a daemon before it detects a problem, then yes, the supervisor will consider it a successful startup. For that reason, I like grknight's suggestion.
Back to top
View user's profile Send private message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Mon Sep 09, 2024 6:51 pm    Post subject: Reply with quote

No, that ProcessManager class is a part of the daemon (cf. the linked gitlab repo).

Seems like the exit code gets lost through the "daemon context". I guess I have to dive a bit deeper in how to write a proper Python daemon …

So the solution would be to setup the daemon in foreground and maybe exit with non-0 there, and somehow daemonize it afterwards, right?
Back to top
View user's profile Send private message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Tue Sep 10, 2024 2:55 pm    Post subject: Reply with quote

As I could not really work it out, I posted a question on Stack Overflow, also containing a complete minimal example.

There, one guy asked why I care about daemonizing my stuff at all if I use OpenRC to supervise it … so: Do I have to?!

I never wrote a daemon before, and esp. not a Python one. So can OpenRC do the work for me? And will this work on other distros, too? The daemon currently runs on Devuan/OpenRC, and even though it's OpenRC, it's not fully compatible with Gentoo and I had to tweak the init script a bit …

Thanks for all help on this!
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22650

PostPosted: Tue Sep 10, 2024 3:25 pm    Post subject: Reply with quote

l3u wrote:
No, that ProcessManager class is a part of the daemon (cf. the linked gitlab repo).
GitLab's browser fails to render due to a JavaScript error.
l3u wrote:
Seems like the exit code gets lost through the "daemon context". I guess I have to dive a bit deeper in how to write a proper Python daemon …
By definition, a daemon exits 0 and leaves a child running. The child's exit status is only visible to init, which neither knows nor cares what it means.
l3u wrote:
So the solution would be to setup the daemon in foreground and maybe exit with non-0 there, and somehow daemonize it afterwards, right?
Not quite. Daemons are not in the foreground. That's why daemonize moves the caller to the background. You could run your initialization code before moving to daemon status, or you could just remain in the foreground under supervision of a process that knows not to block other activity while waiting for you to exit (which you would only do when an administrative process tells you to halt completely).
l3u wrote:
There, one guy asked why I care about daemonizing my stuff at all if I use OpenRC to supervise it … so: Do I have to?!
No. That is why grknight told you to use supervise-daemon.
l3u wrote:
I never wrote a daemon before, and esp. not a Python one. So can OpenRC do the work for me?
As I read the manual page for supervise-daemon, yes.
l3u wrote:
And will this work on other distros, too?
If they use openrc, or have an equivalent feature, yes.
Back to top
View user's profile Send private message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Tue Sep 10, 2024 4:11 pm    Post subject: Reply with quote

Hu wrote:
GitLab's browser fails to render due to a JavaScript error.

Using the latest stable Firefox here, everything is fine with GitLab …

Maybe I should ask a bit more generically … if one writes a daemon nowadays (may it be implemented in Python or not) – is it expected to background itself, or is this an init system's task?

The only other Python daemon I know is Radicale, and it seems like the backgrounding and PID file management is up to start-stop-daemon there:
Code:
start() {
    ebegin "Starting radicale"
        start-stop-daemon --start --quiet --background \
        --user radicale \
        --umask 0027 \
        --stderr-logger /usr/bin/logger \
        --pidfile ${PIDFILE} --make-pidfile \
        --exec /usr/bin/radicale
    eend $?
}

stop() {
    ebegin "Stopping radicale"
        start-stop-daemon --stop --quiet \
        --pidfile ${PIDFILE}
    eend $?
}
Back to top
View user's profile Send private message
grknight
Retired Dev
Retired Dev


Joined: 20 Feb 2015
Posts: 1919

PostPosted: Tue Sep 10, 2024 4:38 pm    Post subject: Reply with quote

What Hu is basically suggesting is: in the startup script, move (and change if needed)
Code:
    # Setup the process manager
    if not processManager.setup():
        raise
to just before
Code:
if args.d:

The basic configuration should come before any daemon forking call. If that configuration fails, it can be communicated to the calling supervisor.
Back to top
View user's profile Send private message
GDH-gentoo
Veteran
Veteran


Joined: 20 Jul 2019
Posts: 1699
Location: South America

PostPosted: Tue Sep 10, 2024 5:03 pm    Post subject: Reply with quote

l3u wrote:
Maybe I should ask a bit more generically … if one writes a daemon nowadays (may it be implemented in Python or not) – is it expected to background itself, or is this an init system's task?

Let's see if I can clarify.

The way that your OpenRC service script is written means that service startup will be delegated to OpenRC's start-stop-daemon program. As far as I can tell, start-stop-daemon does, in fact, consider the exit code of the program it is told to run (/usr/bin/go-e-pvsd here), and does 'forward' to OpenRC the success / failure state implied by it.

Now. I can't follow the Python code, but, from a design point of view, what go-e-pvsd with the -d option is supposed to do, is fork a child process —which would be the long-running process that does the actual work that the daemon is expected to do— and exit. Therefore, go-e-pvsd -d itself should be a short-lived process. In fact, OpenRC is a serial service manager, so until go-e-pvsd exits, if service startup is happening as part of entering an OpenRC (named) runlevel, then all services scheduled after go-e-pvsd will be delayed. So it also can't take too long to exit.

Therefore, the problem here is that, if you actually want OpenRC to consider service startup a failure, rather than continuing with other services and leaving it up to you to discover with rc-service that the child process isn't actually running, then you need to run enough startup code in go-e-pvsd to determine success or failure state before forking the child, so that it can exit without forking and with a suitable exit code in the failure case. Subject to the constrait that it can't take long to do that.

Now compare this to whatever your Python code does :) Note that any failure after forking won't be detected by OpenRC.

OpenRC's supervise-daemon is a different beast, it is a process supervisor.

By the way, this description applies to upstream OpenRC, which is what Gentoo packages. I believe that Debian and Devuan ship a modified version. And I don't know what Alpine or other OpenRC-based distributions ship exactly.
_________________
NeddySeagoon wrote:
I'm not a witch, I'm a retired electronics engineer :)
Ionen wrote:
As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though :)


Last edited by GDH-gentoo on Tue Sep 10, 2024 6:29 pm; edited 1 time in total
Back to top
View user's profile Send private message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Tue Sep 10, 2024 5:15 pm    Post subject: Reply with quote

Okay I now tried my luck with supervise-daemon.

I completely removed the daemonizing from my daemon, letting it always run in foreground.

The init script I use is now
Code:
depend() {
    need net
    use logger
}

supervisor="supervise-daemon"
command="/usr/bin/go-e-pvsd"
pidfile="/run/${RC_SVCNAME}.pid"
command_args="-s"
command_args_foreground=""

That can start and stop the daemon as expected (which is a nice fact, because apparently, I actually can cheap out that messing with daemonizing :-)

But – I still can't tell OpenRC that the startup failed. If I exit with code 1, supervise-daemon simply says "/usr/bin/go-e-pvsd, pid XXX, exited with return code 1" and tries to restart it over and over again …

Same for the non-supervise-daemon variant:
Code:
depend() {
    need net
    use logger
}

command="/usr/bin/go-e-pvsd"
pidfile="/run/${RC_SVCNAME}.pid"
command_args="-s"

start() {
    ebegin "Starting ${RC_SVCNAME}"
    start-stop-daemon --start --background \
    --make-pidfile --pidfile ${pidfile}  \
    --exec ${command} -- ${command_args}
    eend $?
}

stop() {
    ebegin "Stopping ${RC_SVCNAME}"
    start-stop-daemon --stop --pidfile ${pidfile}
    eend $?
}

with the only difference that in this case, no restart is attempted.
Back to top
View user's profile Send private message
GDH-gentoo
Veteran
Veteran


Joined: 20 Jul 2019
Posts: 1699
Location: South America

PostPosted: Tue Sep 10, 2024 5:33 pm    Post subject: Reply with quote

l3u wrote:
But – I still can't tell OpenRC that the startup failed. If I exit with code 1, supervise-daemon simply says "/usr/bin/go-e-pvsd, pid XXX, exited with return code 1" and tries to restart it over and over again …

Yeah, that's what a process supervisor does.
_________________
NeddySeagoon wrote:
I'm not a witch, I'm a retired electronics engineer :)
Ionen wrote:
As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though :)
Back to top
View user's profile Send private message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Tue Sep 10, 2024 5:45 pm    Post subject: Reply with quote

So I simply can't tell OpenRC that my startup failed, because I can't exit with non-0 before the fork …
Back to top
View user's profile Send private message
GDH-gentoo
Veteran
Veteran


Joined: 20 Jul 2019
Posts: 1699
Location: South America

PostPosted: Tue Sep 10, 2024 5:49 pm    Post subject: Reply with quote

l3u wrote:
So I simply can't tell OpenRC that my startup failed, because I can't exit with non-0 before the fork …

Can't you modify the code so that it does? I wish I could make a suggestion, but I can't follow that Python code :P
_________________
NeddySeagoon wrote:
I'm not a witch, I'm a retired electronics engineer :)
Ionen wrote:
As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though :)
Back to top
View user's profile Send private message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Tue Sep 10, 2024 5:56 pm    Post subject: Reply with quote

I maybe somehow could have as long as the code forked itself using python-daemon (I don't know how, I could not get it to work … cf. my Stack Overflow post). But as soon as I let OpenRC fork the process, there's no way to exit before the fork I think, no?!
Back to top
View user's profile Send private message
GDH-gentoo
Veteran
Veteran


Joined: 20 Jul 2019
Posts: 1699
Location: South America

PostPosted: Tue Sep 10, 2024 6:12 pm    Post subject: Reply with quote

l3u wrote:
But as soon as I let OpenRC fork the process, there's no way to exit before the fork I think, no?!

With your original service script, OpenRC doesn't do any forking, go-e-pvsd is expected to, at least in the success case. And it should exit (rather quickly) with an appropriate exit code in any case.
_________________
NeddySeagoon wrote:
I'm not a witch, I'm a retired electronics engineer :)
Ionen wrote:
As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though :)


Last edited by GDH-gentoo on Tue Sep 10, 2024 6:15 pm; edited 1 time in total
Back to top
View user's profile Send private message
grknight
Retired Dev
Retired Dev


Joined: 20 Feb 2015
Posts: 1919

PostPosted: Tue Sep 10, 2024 6:12 pm    Post subject: Reply with quote

l3u wrote:
there's no way to exit before the fork I think, no?!

Sure there is.. do your processManager.setup() before the fork (the with daemon statement).
processManager.setup is where you are trying to bail from or am I wrong?
Back to top
View user's profile Send private message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Tue Sep 10, 2024 6:42 pm    Post subject: Reply with quote

Yeah,and exactly that is the point where it is not working (cf. the SO post). If I setup the HTTP server before forking, it's not accessible. And the HTTP server startup is one of the preconditions that should lead to startup failure if it fails. But as soon as I enter the daemon context, OpenRC thinks my startup was successful.

Just to post it also here: This is a minimal compatible example:
Code:
#!/usr/bin/env python3

import sys
import signal
import argparse
import daemon
import daemon.pidfile
from syslog import syslog
import threading
from http.server import HTTPServer, BaseHTTPRequestHandler
from time import strftime

parser = argparse.ArgumentParser()
parser.add_argument("-d", action = "store_true", help = "daemonize")
args = parser.parse_args()

class RequestHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.end_headers()
        self.wfile.write(b"I'm here")

class ProcessManager:
    def __init__(self):
        self.timer = None
        self.server = None
        self.signalCatched = False
        self.finished = threading.Event()

    def setup(self) -> bool:
        syslog("Setting up different stuff")
        # All kind of stuff that could fail, returning False then

        syslog("Setting up the HTTP server")
        try:
            self.server = HTTPServer(("127.0.0.1", 8000), RequestHandler)
        except Exception as error:
            syslog("Failed to setup the HTTP server")
            return False

        return True

    def start(self):
        thread = threading.Thread(target = self.server.serve_forever)
        thread.deamon = True
        thread.start()
        self.scheduleNextRun()

    def scheduleNextRun(self):
        if self.signalCatched:
            return

        syslog("Daemon running at {}".format(strftime("%Y-%m-%d %H:%M:%S")))

        self.timer = threading.Timer(3, self.scheduleNextRun)
        self.timer.start()

    def terminate(self, signum, frame):
        syslog("Catched signal, will now terminate")
        self.signalCatched = True

        if self.timer:
            self.timer.cancel()

        self.server.shutdown()

        self.finished.set()

def setupProcessManager():
    if not processManager.setup():
        sys.exit(1)

    signal.signal(signal.SIGTERM, processManager.terminate)
    signal.signal(signal.SIGINT, processManager.terminate)

    processManager.start()

processManager = ProcessManager()

if args.d:
    with daemon.DaemonContext(pidfile = daemon.pidfile.PIDLockFile("/run/test.pid")):
        syslog("Starting up in daemon mode")
        setupProcessManager()
        processManager.finished.wait()
else:
    syslog("Starting up in foreground mode")
    setupProcessManager()
Back to top
View user's profile Send private message
GDH-gentoo
Veteran
Veteran


Joined: 20 Jul 2019
Posts: 1699
Location: South America

PostPosted: Tue Sep 10, 2024 6:53 pm    Post subject: Reply with quote

l3u wrote:
Just to post it also here: This is a minimal compatible example:

I think that what grknight is suggesting is something like this (not tested):
Code:
#!/usr/bin/env python3

# ...

def runProcessManager():
    signal.signal(signal.SIGTERM, processManager.terminate)
    signal.signal(signal.SIGINT, processManager.terminate)

    processManager.start()

processManager = ProcessManager()

if not processManager.setup():
    sys.exit(1)

if args.d:
    # Replace with forking code, make sure that the parent exits with code 0.
else:
    # Non-forking code
    syslog("Starting up in foreground mode")
    runProcessManager()

_________________
NeddySeagoon wrote:
I'm not a witch, I'm a retired electronics engineer :)
Ionen wrote:
As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though :)
Back to top
View user's profile Send private message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Tue Sep 10, 2024 7:06 pm    Post subject: Reply with quote

As said, this does not work. If I setup the HTTP server outside of the daemon context, it's not accessible. Also, the signal connections have to be made inside the daemon context to work.
Back to top
View user's profile Send private message
grknight
Retired Dev
Retired Dev


Joined: 20 Feb 2015
Posts: 1919

PostPosted: Tue Sep 10, 2024 7:37 pm    Post subject: Reply with quote

Since you don't like supervise-daemon, my original response includes one you haven't commended on:
grknight wrote:
If pid creation must come first, try setting SSD_STARTWAIT=1000 (ms) in the script to delay the pid check.
(the OpenRC init script)

This may also be issued like start_stop_daemon_args="--wait 1000" supervise_daemon_args=""

Does not work with Debian's start-stop-daemon program
Back to top
View user's profile Send private message
l3u
Advocate
Advocate


Joined: 26 Jan 2005
Posts: 2610
Location: Konradsreuth (Germany)

PostPosted: Tue Sep 10, 2024 8:03 pm    Post subject: Reply with quote

I do like supervise-daemon a lot, but it did not make any difference to use it

I can't use that wait feature either if it's Gentoo-specific :-(
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3431

PostPosted: Tue Sep 10, 2024 9:35 pm    Post subject: Reply with quote

Quote:
But – I still can't tell OpenRC that the startup failed. If I exit with code 1, supervise-daemon simply says "/usr/bin/go-e-pvsd, pid XXX, exited with return code 1" and tries to restart it over and over again …
Well, yes, that's what supervisor does: it restarts the service when it crashes.
Is there some permanent error condition for which you'd rather it was flagged as failed and stayed down than have supervisor restart it?
_________________
Make Computing Fun Again
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum