View previous topic :: View next topic |
Author |
Message |
Bio99 n00b
Joined: 30 Apr 2004 Posts: 11
|
Posted: Mon Nov 08, 2004 4:46 pm Post subject: named monitoring? |
|
|
On my Gentoo system 'named' dies from time to time for no apparent reason.
Since a working nameserver is essential, I want to add a monitoring (cron-) script, that restarts named in case of failure.
Does anybody already have a solution ready?
Thanks
Bio |
|
Back to top |
|
|
smutt n00b
Joined: 23 Aug 2003 Posts: 51 Location: Utrecht, Netherlands
|
Posted: Mon Nov 08, 2004 5:04 pm Post subject: |
|
|
Here's something fast and dirty...
Code: |
#!/bin/sh
if [ -n "ps -ef|grep named|grep -v grep" ];
then
/etc/init.d/named stop >/dev/null 2>&1
/etc/init.d/named start >/dev/null 2>&1
fi
|
Put that in your crontab and smoke it |
|
Back to top |
|
|
[dmnd] n00b
Joined: 02 Nov 2003 Posts: 48 Location: Netherlands
|
Posted: Mon Nov 08, 2004 5:56 pm Post subject: Re: named monitoring? |
|
|
Bio99 wrote: | Does anybody already have a solution ready? |
Get rid of bind and install powerdns with bind backend? _________________ cold as ice... |
|
Back to top |
|
|
Bio99 n00b
Joined: 30 Apr 2004 Posts: 11
|
Posted: Tue Nov 09, 2004 2:18 pm Post subject: |
|
|
Thanks smutt, that's really a fast solution .
A followup to my question:
"/etc/init.d/named stop" fails when the named process died. I have this problem with other init scripts too.
"status" reports that the daemon is still running, since there is a pid file or socket lying around in /var/run. "stop" refuses to stop it (well it's not running), but "start" will fail too.
I often set the pid file to another running process in such a case. Then "stop" succeeds, and a restart is possible.
There shoud be a "--force" flag to the init script or something similar. |
|
Back to top |
|
|
smutt n00b
Joined: 23 Aug 2003 Posts: 51 Location: Utrecht, Netherlands
|
Posted: Tue Nov 09, 2004 2:24 pm Post subject: |
|
|
You could try just deleting the pid file and then just executing /etc/init.d/named start. If you set the pid file to another running process you might end up killing that other process. Be careful.
--Smutt |
|
Back to top |
|
|
tuxmin l33t
Joined: 24 Apr 2004 Posts: 838 Location: Heidelberg
|
Posted: Tue Nov 09, 2004 2:29 pm Post subject: |
|
|
Do it the Gentoo way:
Code: |
/etc/init.d/service zap
|
maybe this works best (never tried):
Code: |
/etc/init.d/service stop || /etc/init.d/service zap
|
_________________ ALT-F4 |
|
Back to top |
|
|
Bio99 n00b
Joined: 30 Apr 2004 Posts: 11
|
Posted: Tue Nov 09, 2004 11:55 pm Post subject: |
|
|
Thanks for your comments. I'm using a script like this now:
Code: |
#!/bin/sh
if [ -n "ps -ef|grep named|grep -v grep" ]; then
/etc/init.d/named stop >/dev/null 2>&1 || /etc/init.d/named zap >/dev/null 2>&1
/etc/init.d/named start >/dev/null 2>&1
fi
|
The "zap" part works well. The only problem is: the if clause is executed in any case, even if named is running.
The ps-grep expression seems to be correct, it returns a list of named processes or nothing, depending on the state of named.
The problem must be in the [ -n ... ] expression, and I'm really not a "sh" guru ... |
|
Back to top |
|
|
tuxmin l33t
Joined: 24 Apr 2004 Posts: 838 Location: Heidelberg
|
Posted: Wed Nov 10, 2004 6:36 am Post subject: |
|
|
Try this:
Code: |
if ( ! ps -ef | grep -v grep | grep named ); then
...
fi
|
_________________ ALT-F4 |
|
Back to top |
|
|
sschlueter Guru
Joined: 26 Jul 2002 Posts: 578 Location: Dortmund, Germany
|
Posted: Wed Nov 10, 2004 8:13 am Post subject: Re: named monitoring? |
|
|
Bio99 wrote: | On my Gentoo system 'named' dies from time to time for no apparent reason.
|
This should not happen. Are there any logfile entries created before it crashes?
Bio99 wrote: |
Since a working nameserver is essential, I want to add a monitoring (cron-) script, that restarts named in case of failure.
|
Instead of using cron to monitor the service, you could also use
http://cr.yp.to/daemontools.html
sys-apps/daemontools |
|
Back to top |
|
|
forbjok Apprentice
Joined: 21 May 2004 Posts: 207 Location: Hordaland, Norge
|
Posted: Wed Nov 10, 2004 8:47 am Post subject: |
|
|
Bio99 wrote: | Thanks for your comments. I'm using a script like this now:
Code: |
#!/bin/sh
if [ -n "ps -ef|grep named|grep -v grep" ]; then
/etc/init.d/named stop >/dev/null 2>&1 || /etc/init.d/named zap >/dev/null 2>&1
/etc/init.d/named start >/dev/null 2>&1
fi
|
The "zap" part works well. The only problem is: the if clause is executed in any case, even if named is running.
The ps-grep expression seems to be correct, it returns a list of named processes or nothing, depending on the state of named.
The problem must be in the [ -n ... ] expression, and I'm really not a "sh" guru ... |
I think you'll want to use backticks, otherwise the string will just be set to the command itself, rather than the output of the command. Also, I believe the "-n" does the exact opposite of what you want - it would return true only if the daemon is running. "-z" should return true if the string is blank, so that should work.
Like this:
Code: |
#!/bin/sh
if [ -z "`ps -ef|grep 'named'|grep -v 'grep'`" ]; then
/etc/init.d/named stop >/dev/null 2>&1 || /etc/init.d/named zap >/dev/null 2>&1
/etc/init.d/named start >/dev/null 2>&1
fi
|
Note the added backticks inside the doubleqoutes. Putting something in backticks tells the script to run the content as a shell command and return the command's output. That's true both for Perl and shell scripts. |
|
Back to top |
|
|
Bio99 n00b
Joined: 30 Apr 2004 Posts: 11
|
Posted: Wed Nov 10, 2004 11:10 pm Post subject: Re: named monitoring? |
|
|
sschlueter wrote: |
This should not happen. Are there any logfile entries created before it crashes?
|
No. I had memory issues on that machine. I thought that I had fixed them with a slower memory timing, but maybe they are still the reason for named to die.
Thanks for the sys-apps/daemontools link, I'll look into that later. |
|
Back to top |
|
|
Bio99 n00b
Joined: 30 Apr 2004 Posts: 11
|
Posted: Wed Nov 10, 2004 11:23 pm Post subject: |
|
|
forbjok wrote: |
I think you'll want to use backticks, otherwise the string will just be set to the command itself, rather than the output of the command. Also, I believe the "-n" does the exact opposite of what you want - it would return true only if the daemon is running. "-z" should return true if the string is blank, so that should work.
Like this:
Code: |
#!/bin/sh
if [ -z "`ps -ef|grep 'named'|grep -v 'grep'`" ]; then
/etc/init.d/named stop >/dev/null 2>&1 || /etc/init.d/named zap >/dev/null 2>&1
/etc/init.d/named start >/dev/null 2>&1
fi
|
Note the added backticks inside the doubleqoutes. Putting something in backticks tells the script to run the content as a shell command and return the command's output. That's true both for Perl and shell scripts. |
It's hard to believe, but even the backtick solution doesn't work. With the "-z" it never restarts named. Below is the output of the expression, with and without named running.
Code: |
# echo "`ps -ef|grep 'named'|grep -v 'grep'`"
named 19541 1 0 00:16 ? 00:00:00 /usr/sbin/named -u named -n 1
named 19543 19541 0 00:16 ? 00:00:00 /usr/sbin/named -u named -n 1
named 19544 19543 0 00:16 ? 00:00:00 /usr/sbin/named -u named -n 1
named 19545 19543 0 00:16 ? 00:00:00 /usr/sbin/named -u named -n 1
named 19546 19543 0 00:16 ? 00:00:00 /usr/sbin/named -u named -n 1
# kill 19541
# echo "`ps -ef|grep 'named'|grep -v 'grep'`"
#
|
|
|
Back to top |
|
|
forbjok Apprentice
Joined: 21 May 2004 Posts: 207 Location: Hordaland, Norge
|
Posted: Thu Nov 11, 2004 10:14 am Post subject: |
|
|
Bio99 wrote: | forbjok wrote: |
I think you'll want to use backticks, otherwise the string will just be set to the command itself, rather than the output of the command. Also, I believe the "-n" does the exact opposite of what you want - it would return true only if the daemon is running. "-z" should return true if the string is blank, so that should work.
Like this:
Code: |
#!/bin/sh
if [ -z "`ps -ef|grep 'named'|grep -v 'grep'`" ]; then
/etc/init.d/named stop >/dev/null 2>&1 || /etc/init.d/named zap >/dev/null 2>&1
/etc/init.d/named start >/dev/null 2>&1
fi
|
Note the added backticks inside the doubleqoutes. Putting something in backticks tells the script to run the content as a shell command and return the command's output. That's true both for Perl and shell scripts. |
It's hard to believe, but even the backtick solution doesn't work. With the "-z" it never restarts named. Below is the output of the expression, with and without named running.
Code: |
# echo "`ps -ef|grep 'named'|grep -v 'grep'`"
named 19541 1 0 00:16 ? 00:00:00 /usr/sbin/named -u named -n 1
named 19543 19541 0 00:16 ? 00:00:00 /usr/sbin/named -u named -n 1
named 19544 19543 0 00:16 ? 00:00:00 /usr/sbin/named -u named -n 1
named 19545 19543 0 00:16 ? 00:00:00 /usr/sbin/named -u named -n 1
named 19546 19543 0 00:16 ? 00:00:00 /usr/sbin/named -u named -n 1
# kill 19541
# echo "`ps -ef|grep 'named'|grep -v 'grep'`"
#
|
|
I get those too, but only if named is running. Could it be that the named processes don't die, but simply crash for some reason or another? Did you try to Code: | # killall named
or if that fails,
# killall -9 named |
and then run the ps command?
If this is the case, checking for running processes won't do any good, as they will still be running, just not working. I did some testing on my DNS box, and when i stop named using the initscript, it shuts down the named processes.
If the processes just stop responding, but doesn't exit, you'll have to find a way to determine if they've stopped working instead. For instance by having the script run some program that tries to use the DNS server, or if it stops listening completely, just tries to connect to the DNS server's port to see if it's working. I'm not sure what programs can be used for that though.
Before going through the trouble of writing such a script, I'd recommend trying to recompile, if you haven't tried that already. Just make sure not to overwrite any configs with etc-update/dispatch-conf afterwards.
Good luck |
|
Back to top |
|
|
Bio99 n00b
Joined: 30 Apr 2004 Posts: 11
|
Posted: Thu Nov 11, 2004 11:04 am Post subject: |
|
|
forbjok wrote: |
I get those too, but only if named is running. Could it be that the named processes don't die, but simply crash for some reason or another? Did you try to Code: | # killall named
or if that fails,
# killall -9 named |
and then run the ps command?
If this is the case, checking for running processes won't do any good, as they will still be running, just not working. I did some testing on my DNS box, and when i stop named using the initscript, it shuts down the named processes.
If the processes just stop responding, but doesn't exit, you'll have to find a way to determine if they've stopped working instead. For instance by having the script run some program that tries to use the DNS server, or if it stops listening completely, just tries to connect to the DNS server's port to see if it's working. I'm not sure what programs can be used for that though.
|
No, the processes are really gone, 'ps' doesn't list them anymore.
As you can see in my test sequence, I kill the named process. The 'ps-grep' command sequence returns an empty line.
The strange thing is, that even after that, the watchdog script doesn't restart named. That's why I suspect, the test is wrong.
I think I will migrate to djbdns soon. I hope the conversion of conifg files will go smooth. |
|
Back to top |
|
|
Bio99 n00b
Joined: 30 Apr 2004 Posts: 11
|
Posted: Sun Nov 14, 2004 9:07 pm Post subject: Solution |
|
|
I finally found the solution, why my shell script didn't work. It's really funny, because not expected.
For reference, here the final version of the monitoring script:
Code: | #!/bin/sh
if [ -z "`ps -ef|grep 'named'|grep -v 'grep'`" ] ; then
/etc/init.d/named stop >/dev/null 2>&1 || /etc/init.d/named zap >/dev/null 2>&1
/etc/init.d/named start >/dev/null 2>&1
fi |
Im not so experienced in 'sh' scripts, so I studied, if the 'if' statement may be incorrect. Then I wrote another script, called 'test.sh', which did the same, but echoed some debug info. To my astonishement, this script worked. I removed the echo, until both scripts were the same.
Crazy thing: the 'test.sh' worked, but not the orignial 'named_monitoring.sh', though both had identical md5sums. - Oh, wait a minute, what was the name of the script? 'named_monitoring.sh' ... Argh!
A match for the 'grep' statement. Shooting myself in the foot.
Thanks to all who have contributed! |
|
Back to top |
|
|
sgtrock Tux's lil' helper
Joined: 21 Feb 2003 Posts: 87
|
Posted: Fri Jun 17, 2005 7:46 am Post subject: |
|
|
As an aside, I've personally come to prefer start-stop-daemon to daemontools. Less fuss and muss, and a better fit to the LSB standards (logfiles are where you normally expect them, for example). Worth a look. |
|
Back to top |
|
|
|