View previous topic :: View next topic |
Author |
Message |
knight77 n00b
Joined: 29 Jun 2009 Posts: 25
|
Posted: Mon Jun 29, 2009 9:24 am Post subject: unexpected server halt |
|
|
Hello.
We are managing 2 Gentoo servers and one of them just shut down without any root telling it to do so.
Checking /var/log/everything/current we found the following lines:
[...]
Jun 29 10:21:39 [postfix/qmgr] 6EED11CAE7: removed
Jun 29 10:22:10 [vol_id] no device_
- Last output repeated 8 times -
Jun 29 10:22:13 [shutdown] shutting down for system halt
Jun 29 10:22:13 [init] Switching to runlevel: 0
Jun 29 10:22:16 [snmpd] Received TERM or STOP signal... shutting down...
[...]
As far as i can see, right after the [vol_id] message the server started to shut down. I have no idea how come the vol_id entries were issue, all i could find on Internet is that it may be related to udev, but nothing containing vol_id and "no device_".
Anybody ever ran into a similar problem? Any information about the server is available on request.
The Gentoo server is fully up to date and serves as a mail and web server (postfix + apache (vhosts)).
Thank you for your time or hints. |
|
Back to top |
|
|
audiodef Watchman
Joined: 06 Jul 2005 Posts: 6639 Location: The soundosphere
|
|
Back to top |
|
|
knight77 n00b
Joined: 29 Jun 2009 Posts: 25
|
Posted: Tue Jun 30, 2009 5:28 am Post subject: |
|
|
Nope, it didn't happen before, it's the first time i ever see this behaviour. That's why i'm confused too, since i can't figure out how come the server just decided to shutdown by itself.
It's located in a provider datacenter, so another possibility, as far as i can guess is that somebody mistook it for another server and maybe believing it was a Windows, hit the Ctrl-Alt-Del making it shut down. Still, if so, it should have rebooted, not halted.
What i still can't explain is what caused the 9 [vol_id] messages. It may be related or not to the shutdown, so it may be the closest explaination to why it shut down.
Any idea anybody of what might have caused the [vol_id] syslog messages? If any data about the server is needed, please ask.
Thank you for your time. |
|
Back to top |
|
|
audiodef Watchman
Joined: 06 Jul 2005 Posts: 6639 Location: The soundosphere
|
|
Back to top |
|
|
knight77 n00b
Joined: 29 Jun 2009 Posts: 25
|
Posted: Wed Jul 01, 2009 9:32 am Post subject: |
|
|
No, it hasn't happened since.
I have rechecked the server and the other Gentoo server we manage (similar role, but without a MTA), and the vol_id message didn't show up again in the last 5 day at least.
The datacenter provider has quite everything under lock, so we believe it's a very slim possibility somebody physically interfered with the server.
As fas as can tell, there are 2 possibilities:
1. the vol_id is related to the shutdown being initiated, but we don't know how.
2. something else still unknown caused the halt.
The only "strange" thing about that server is that the temperature of a hard-drive was nearing it's upper limit (it was running at 53 degrees C for a few days, with the maximum allowed in the vendor specs of 55). Right now it's running at 47.
The rc-status on the self-halted server is the following:
Runlevel: default
apache2
coldplug
courier-authlib
courier-imapd
courier-pop3d-ssl
fcron
fwinit
local
lsa
metalog
mysql
named
net.eth0
net.eth1
netmount
postfix
pure-ftpd
rngd
saslauthd
snmpd
sshd
uptimed
Note: The lsa service is a hardware resources (CPU, RAM, Disks) monitoring agent we use.
So far we're still in the dark as to what caused the halt. In case nobody else updates this thread with any idea, i believe the thread can be closed, we'll reopen it (if possible) or create a new one if another unexpected shutdown happens again.
Thank you for your time. |
|
Back to top |
|
|
audiodef Watchman
Joined: 06 Jul 2005 Posts: 6639 Location: The soundosphere
|
|
Back to top |
|
|
unixbhaskar Tux's lil' helper
Joined: 29 Nov 2007 Posts: 119 Location: India
|
Posted: Fri Jul 03, 2009 5:48 pm Post subject: Hope this is the right place for apache problem |
|
|
Once I tried to emerge apache I got this error;
checking if POSIX sems affect threads in the same process... no
checking if SysV sems affect threads in the same process... no
checking if fcntl locks affect threads in the same process... no
checking if flock locks affect threads in the same process... no
checking for entropy source... configure: error: /dev/urandom not found or unreadable.
!!! Please attach the following file when seeking support:
!!! /var/tmp/portage/dev-libs/apr-1.3.5/work/apr-1.3.5/config.log
*
* ERROR: dev-libs/apr-1.3.5 failed.
* Call stack:
* ebuild.sh, line 49: Called src_configure
* environment, line 2653: Called econf '--enable-layout=gentoo' '--enable-nonportable-atomics' '--enable-threads' '--with-devrandom=/dev/urandom'
* ebuild.sh, line 534: Called die
* The specific snippet of code:
* die "econf failed"
* The die message:
* econf failed
*
* If you need support, post the topmost build error, and the call stack if relevant.
* A complete build log is located at '/var/tmp/portage/dev-libs/apr-1.3.5/temp/build.log'.
* The ebuild environment file is located at '/var/tmp/portage/dev-libs/apr-1.3.5/temp/environment'.
*
>>> Failed to emerge dev-libs/apr-1.3.5, Log file:
>>> '/var/tmp/portage/dev-libs/apr-1.3.5/temp/build.log'
* Messages for package dev-libs/apr-1.3.5:
*
* ERROR: dev-libs/apr-1.3.5 failed.
* Call stack:
* ebuild.sh, line 49: Called src_configure
* environment, line 2653: Called econf '--enable-layout=gentoo' '--enable-nonportable-atomics' '--enable-threads' '--with-devrandom=/dev/urandom'
* ebuild.sh, line 534: Called die
* The specific snippet of code:
* die "econf failed"
* The die message:
* econf failed
*
* If you need support, post the topmost build error, and the call stack if relevant.
* A complete build log is located at '/var/tmp/portage/dev-libs/apr-1.3.5/temp/build.log'.
* The ebuild environment file is located at '/var/tmp/portage/dev-libs/apr-1.3.5/temp/environment'.
*
Any clear cut solution would be appreciated .Thanks in advance. |
|
Back to top |
|
|
knight77 n00b
Joined: 29 Jun 2009 Posts: 25
|
Posted: Tue Jul 07, 2009 10:35 am Post subject: unexpected server halt |
|
|
Hello again.
We checked with the datacenter provider and they confirmed nobody had accessed the room where the server is located in the timeframe when the server started the halt. As such, accidental halt by somebody in the datacenter has been ruled out.
We'll try to stop the server for a few minutes in order to check the BIOS settings for any hardware temperature protection that might be enabled. Also, as i side note, we'll try to remove the cover on the tower hoping the A/C will cool it better than the already installed fans in the case (too few).
I will post here again in case we find out something new.
Thank you for your time.
PS. What does the apache emerge error from the previous post have anything to do with this thread? Doesn't unixbhaskar know how to open a new thread? |
|
Back to top |
|
|
audiodef Watchman
Joined: 06 Jul 2005 Posts: 6639 Location: The soundosphere
|
|
Back to top |
|
|
unixbhaskar Tux's lil' helper
Joined: 29 Nov 2007 Posts: 119 Location: India
|
Posted: Tue Jul 07, 2009 2:54 pm Post subject: |
|
|
Ignore the apache thing ,I have rectify it.
Knight have you read the subject line of my post??? _________________ Musing with GNU/Linux
Lenovo Thinkpad x250
x86_64 Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz GenuineIntel GNU/Linux
RAM : 8 GB
Kernel :Latest customized kernel
OS: Gentoo/Arch/Slackware/Debian/openSUSE/Fedora
Intel 965GM Chipset |
|
Back to top |
|
|
|