Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
named monitoring?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
Bio99
n00b
n00b


Joined: 30 Apr 2004
Posts: 11

PostPosted: Mon Nov 08, 2004 4:46 pm    Post subject: named monitoring? Reply with quote

On my Gentoo system 'named' dies from time to time for no apparent reason.

Since a working nameserver is essential, I want to add a monitoring (cron-) script, that restarts named in case of failure.

Does anybody already have a solution ready?

Thanks
Bio
Back to top
View user's profile Send private message
smutt
n00b
n00b


Joined: 23 Aug 2003
Posts: 51
Location: Utrecht, Netherlands

PostPosted: Mon Nov 08, 2004 5:04 pm    Post subject: Reply with quote

Here's something fast and dirty...

Code:

#!/bin/sh
if [ -n "ps -ef|grep named|grep -v grep" ];
then
/etc/init.d/named stop >/dev/null 2>&1
/etc/init.d/named start >/dev/null 2>&1
fi


Put that in your crontab and smoke it :)
Back to top
View user's profile Send private message
[dmnd]
n00b
n00b


Joined: 02 Nov 2003
Posts: 48
Location: Netherlands

PostPosted: Mon Nov 08, 2004 5:56 pm    Post subject: Re: named monitoring? Reply with quote

Bio99 wrote:
Does anybody already have a solution ready?


Get rid of bind and install powerdns with bind backend? :P
_________________
cold as ice...
Back to top
View user's profile Send private message
Bio99
n00b
n00b


Joined: 30 Apr 2004
Posts: 11

PostPosted: Tue Nov 09, 2004 2:18 pm    Post subject: Reply with quote

Thanks smutt, that's really a fast solution :D .

A followup to my question:
"/etc/init.d/named stop" fails when the named process died. I have this problem with other init scripts too.

"status" reports that the daemon is still running, since there is a pid file or socket lying around in /var/run. "stop" refuses to stop it (well it's not running), but "start" will fail too.

I often set the pid file to another running process in such a case. Then "stop" succeeds, and a restart is possible.

There shoud be a "--force" flag to the init script or something similar.
Back to top
View user's profile Send private message
smutt
n00b
n00b


Joined: 23 Aug 2003
Posts: 51
Location: Utrecht, Netherlands

PostPosted: Tue Nov 09, 2004 2:24 pm    Post subject: Reply with quote

You could try just deleting the pid file and then just executing /etc/init.d/named start. If you set the pid file to another running process you might end up killing that other process. Be careful.

--Smutt
Back to top
View user's profile Send private message
tuxmin
l33t
l33t


Joined: 24 Apr 2004
Posts: 838
Location: Heidelberg

PostPosted: Tue Nov 09, 2004 2:29 pm    Post subject: Reply with quote

Do it the Gentoo way:
Code:

/etc/init.d/service zap

maybe this works best (never tried):
Code:

/etc/init.d/service stop || /etc/init.d/service zap

_________________
ALT-F4
Back to top
View user's profile Send private message
Bio99
n00b
n00b


Joined: 30 Apr 2004
Posts: 11

PostPosted: Tue Nov 09, 2004 11:55 pm    Post subject: Reply with quote

Thanks for your comments. I'm using a script like this now:

Code:

#!/bin/sh
if [ -n "ps -ef|grep named|grep -v grep" ]; then
/etc/init.d/named stop >/dev/null 2>&1 || /etc/init.d/named zap >/dev/null 2>&1
/etc/init.d/named start >/dev/null 2>&1
fi


The "zap" part works well. The only problem is: the if clause is executed in any case, even if named is running.

The ps-grep expression seems to be correct, it returns a list of named processes or nothing, depending on the state of named.

The problem must be in the [ -n ... ] expression, and I'm really not a "sh" guru ...
Back to top
View user's profile Send private message
tuxmin
l33t
l33t


Joined: 24 Apr 2004
Posts: 838
Location: Heidelberg

PostPosted: Wed Nov 10, 2004 6:36 am    Post subject: Reply with quote

Try this:
Code:

if ( ! ps -ef | grep -v grep | grep named ); then
  ...
fi

_________________
ALT-F4
Back to top
View user's profile Send private message
sschlueter
Guru
Guru


Joined: 26 Jul 2002
Posts: 578
Location: Dortmund, Germany

PostPosted: Wed Nov 10, 2004 8:13 am    Post subject: Re: named monitoring? Reply with quote

Bio99 wrote:
On my Gentoo system 'named' dies from time to time for no apparent reason.


This should not happen. Are there any logfile entries created before it crashes?

Bio99 wrote:

Since a working nameserver is essential, I want to add a monitoring (cron-) script, that restarts named in case of failure.


Instead of using cron to monitor the service, you could also use

http://cr.yp.to/daemontools.html

sys-apps/daemontools
Back to top
View user's profile Send private message
forbjok
Apprentice
Apprentice


Joined: 21 May 2004
Posts: 207
Location: Hordaland, Norge

PostPosted: Wed Nov 10, 2004 8:47 am    Post subject: Reply with quote

Bio99 wrote:
Thanks for your comments. I'm using a script like this now:

Code:

#!/bin/sh
if [ -n "ps -ef|grep named|grep -v grep" ]; then
/etc/init.d/named stop >/dev/null 2>&1 || /etc/init.d/named zap >/dev/null 2>&1
/etc/init.d/named start >/dev/null 2>&1
fi


The "zap" part works well. The only problem is: the if clause is executed in any case, even if named is running.

The ps-grep expression seems to be correct, it returns a list of named processes or nothing, depending on the state of named.

The problem must be in the [ -n ... ] expression, and I'm really not a "sh" guru ...


I think you'll want to use backticks, otherwise the string will just be set to the command itself, rather than the output of the command. Also, I believe the "-n" does the exact opposite of what you want - it would return true only if the daemon is running. "-z" should return true if the string is blank, so that should work.

Like this:
Code:

#!/bin/sh
if [ -z "`ps -ef|grep 'named'|grep -v 'grep'`" ]; then
/etc/init.d/named stop >/dev/null 2>&1 || /etc/init.d/named zap >/dev/null 2>&1
/etc/init.d/named start >/dev/null 2>&1
fi


Note the added backticks inside the doubleqoutes. Putting something in backticks tells the script to run the content as a shell command and return the command's output. That's true both for Perl and shell scripts.
Back to top
View user's profile Send private message
Bio99
n00b
n00b


Joined: 30 Apr 2004
Posts: 11

PostPosted: Wed Nov 10, 2004 11:10 pm    Post subject: Re: named monitoring? Reply with quote

sschlueter wrote:

This should not happen. Are there any logfile entries created before it crashes?


No. I had memory issues on that machine. I thought that I had fixed them with a slower memory timing, but maybe they are still the reason for named to die.

Thanks for the sys-apps/daemontools link, I'll look into that later.
Back to top
View user's profile Send private message
Bio99
n00b
n00b


Joined: 30 Apr 2004
Posts: 11

PostPosted: Wed Nov 10, 2004 11:23 pm    Post subject: Reply with quote

forbjok wrote:

I think you'll want to use backticks, otherwise the string will just be set to the command itself, rather than the output of the command. Also, I believe the "-n" does the exact opposite of what you want - it would return true only if the daemon is running. "-z" should return true if the string is blank, so that should work.

Like this:
Code:

#!/bin/sh
if [ -z "`ps -ef|grep 'named'|grep -v 'grep'`" ]; then
/etc/init.d/named stop >/dev/null 2>&1 || /etc/init.d/named zap >/dev/null 2>&1
/etc/init.d/named start >/dev/null 2>&1
fi


Note the added backticks inside the doubleqoutes. Putting something in backticks tells the script to run the content as a shell command and return the command's output. That's true both for Perl and shell scripts.


It's hard to believe, but even the backtick solution doesn't work. With the "-z" it never restarts named. Below is the output of the expression, with and without named running.

Code:

# echo "`ps -ef|grep 'named'|grep -v 'grep'`"
named    19541     1  0 00:16 ?        00:00:00 /usr/sbin/named -u named -n 1
named    19543 19541  0 00:16 ?        00:00:00 /usr/sbin/named -u named -n 1
named    19544 19543  0 00:16 ?        00:00:00 /usr/sbin/named -u named -n 1
named    19545 19543  0 00:16 ?        00:00:00 /usr/sbin/named -u named -n 1
named    19546 19543  0 00:16 ?        00:00:00 /usr/sbin/named -u named -n 1
# kill 19541
# echo "`ps -ef|grep 'named'|grep -v 'grep'`"

#
Back to top
View user's profile Send private message
forbjok
Apprentice
Apprentice


Joined: 21 May 2004
Posts: 207
Location: Hordaland, Norge

PostPosted: Thu Nov 11, 2004 10:14 am    Post subject: Reply with quote

Bio99 wrote:
forbjok wrote:

I think you'll want to use backticks, otherwise the string will just be set to the command itself, rather than the output of the command. Also, I believe the "-n" does the exact opposite of what you want - it would return true only if the daemon is running. "-z" should return true if the string is blank, so that should work.

Like this:
Code:

#!/bin/sh
if [ -z "`ps -ef|grep 'named'|grep -v 'grep'`" ]; then
/etc/init.d/named stop >/dev/null 2>&1 || /etc/init.d/named zap >/dev/null 2>&1
/etc/init.d/named start >/dev/null 2>&1
fi


Note the added backticks inside the doubleqoutes. Putting something in backticks tells the script to run the content as a shell command and return the command's output. That's true both for Perl and shell scripts.


It's hard to believe, but even the backtick solution doesn't work. With the "-z" it never restarts named. Below is the output of the expression, with and without named running.

Code:

# echo "`ps -ef|grep 'named'|grep -v 'grep'`"
named    19541     1  0 00:16 ?        00:00:00 /usr/sbin/named -u named -n 1
named    19543 19541  0 00:16 ?        00:00:00 /usr/sbin/named -u named -n 1
named    19544 19543  0 00:16 ?        00:00:00 /usr/sbin/named -u named -n 1
named    19545 19543  0 00:16 ?        00:00:00 /usr/sbin/named -u named -n 1
named    19546 19543  0 00:16 ?        00:00:00 /usr/sbin/named -u named -n 1
# kill 19541
# echo "`ps -ef|grep 'named'|grep -v 'grep'`"

#


I get those too, but only if named is running. Could it be that the named processes don't die, but simply crash for some reason or another? Did you try to
Code:
# killall named
or if that fails,
# killall -9 named

and then run the ps command?

If this is the case, checking for running processes won't do any good, as they will still be running, just not working. I did some testing on my DNS box, and when i stop named using the initscript, it shuts down the named processes.

If the processes just stop responding, but doesn't exit, you'll have to find a way to determine if they've stopped working instead. For instance by having the script run some program that tries to use the DNS server, or if it stops listening completely, just tries to connect to the DNS server's port to see if it's working. I'm not sure what programs can be used for that though.

Before going through the trouble of writing such a script, I'd recommend trying to recompile, if you haven't tried that already. Just make sure not to overwrite any configs with etc-update/dispatch-conf afterwards.

Good luck :wink:
Back to top
View user's profile Send private message
Bio99
n00b
n00b


Joined: 30 Apr 2004
Posts: 11

PostPosted: Thu Nov 11, 2004 11:04 am    Post subject: Reply with quote

forbjok wrote:

I get those too, but only if named is running. Could it be that the named processes don't die, but simply crash for some reason or another? Did you try to
Code:
# killall named
or if that fails,
# killall -9 named

and then run the ps command?

If this is the case, checking for running processes won't do any good, as they will still be running, just not working. I did some testing on my DNS box, and when i stop named using the initscript, it shuts down the named processes.

If the processes just stop responding, but doesn't exit, you'll have to find a way to determine if they've stopped working instead. For instance by having the script run some program that tries to use the DNS server, or if it stops listening completely, just tries to connect to the DNS server's port to see if it's working. I'm not sure what programs can be used for that though.


No, the processes are really gone, 'ps' doesn't list them anymore.

As you can see in my test sequence, I kill the named process. The 'ps-grep' command sequence returns an empty line.

The strange thing is, that even after that, the watchdog script doesn't restart named. That's why I suspect, the test is wrong.

I think I will migrate to djbdns soon. I hope the conversion of conifg files will go smooth.
Back to top
View user's profile Send private message
Bio99
n00b
n00b


Joined: 30 Apr 2004
Posts: 11

PostPosted: Sun Nov 14, 2004 9:07 pm    Post subject: Solution Reply with quote

I finally found the solution, why my shell script didn't work. It's really funny, because not expected.

For reference, here the final version of the monitoring script:

Code:
#!/bin/sh
if [ -z "`ps -ef|grep 'named'|grep -v 'grep'`" ] ; then
/etc/init.d/named stop >/dev/null 2>&1 || /etc/init.d/named zap >/dev/null 2>&1
/etc/init.d/named start >/dev/null 2>&1
fi


Im not so experienced in 'sh' scripts, so I studied, if the 'if' statement may be incorrect. Then I wrote another script, called 'test.sh', which did the same, but echoed some debug info. To my astonishement, this script worked. I removed the echo, until both scripts were the same.

Crazy thing: the 'test.sh' worked, but not the orignial 'named_monitoring.sh', though both had identical md5sums. - Oh, wait a minute, what was the name of the script? 'named_monitoring.sh' ... Argh!

A match for the 'grep' statement. Shooting myself in the foot.

Thanks to all who have contributed!
Back to top
View user's profile Send private message
sgtrock
Tux's lil' helper
Tux's lil' helper


Joined: 21 Feb 2003
Posts: 87

PostPosted: Fri Jun 17, 2005 7:46 am    Post subject: Reply with quote

As an aside, I've personally come to prefer start-stop-daemon to daemontools. Less fuss and muss, and a better fit to the LSB standards (logfiles are where you normally expect them, for example). Worth a look. :)
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum