View previous topic :: View next topic |
Author |
Message |
passive Tux's lil' helper
Joined: 31 Dec 2004 Posts: 105
|
Posted: Fri Oct 05, 2007 11:22 pm Post subject: Diagnosing mysterious server freezing |
|
|
Hi All,
I did an update system the other day, and the server locked up half-way through. Since then, it's been locking up regularly, every couple of hours, and I've been unable to find the culprit. Can someone recommend a good guide for tracking down system freezes on a remote system?
Or, here's all of the packages I emerged:
sys-devel/gettext-0.16.1-r1
sys-devel/gnuconfig-20070118
sys-devel/m4-1.4.9-r1
dev-libs/openssl-0.9.8e-r3
sys-devel/gcc-config-1.3.16
sys-apps/man-1.6e-r3
sys-apps/man-pages-2.64
sys-libs/ncurses-5.6-r1
dev-util/unifdef-1.20
sys-kernel/linux-headers-2.6.21
sys-fs/e2fsprogs-1.39-r2
sys-devel/autoconf-2.61-r1
sys-devel/libtool-1.5.24
app-shells/bash-3.2_p17
sys-libs/readline-5.2_p7
sys-libs/db-4.5.20_p2
dev-libs/expat-2.0.1
dev-lang/python-2.4.4-r5
sys-libs/cracklib-2.8.10
sys-apps/shadow-4.0.18.1-r1
sys-devel/bison-2.3
dev-util/pkgconfig-0.21-r1
sys-devel/binutils-config-1.9-r4
sys-devel/binutils-2.17-r1
dev-libs/gmp-4.2.1-r1
dev-libs/mpfr-2.2.1_p5
sys-libs/timezone-data-2007g
sys-apps/util-linux-2.12r-r7
app-arch/gzip-1.3.12
sys-apps/busybox-1.6.1
sys-apps/gawk-3.1.5-r3
app-arch/tar-1.18-r2
sys-process/psmisc-22.5-r2
sys-apps/file-4.21-r1
net-misc/rsync-2.6.9-r3
sys-apps/debianutils-2.17.5
app-editors/nano-2.0.6
app-arch/cpio-2.9
sys-apps/coreutils-6.9-r1
sys-apps/hdparm-7.7
net-misc/openssh-4.7_p1-r1
sys-fs/udev-114
sys-apps/findutils-4.3.8-r1
sys-apps/less-406
sys-apps/diffutils-2.8.7-r2
Out of those, the only one that stands out to me is the kernel-headers, because I was on Kernel 2.6.18 when the freezes began, and although I did upgrade, it was only to 2.6.20.
Any help would really be appreciated. |
|
Back to top |
|
|
Abraxas l33t
Joined: 25 May 2003 Posts: 814
|
Posted: Sat Oct 06, 2007 1:23 pm Post subject: |
|
|
Have you thought about the possibility of bad hardware? I would check the memory and the hard drive first. |
|
Back to top |
|
|
passive Tux's lil' helper
Joined: 31 Dec 2004 Posts: 105
|
Posted: Sat Oct 06, 2007 6:39 pm Post subject: |
|
|
It's definately a possibility, unfortunately, but the fact it occurred during an emerge makes me somewhat suspect. Also, it seems to freeze regardless of the load level. Usually I associate hardware failures with freezing under certain loads. Finally, since the server is in San Jose, and I'm in Nova Scotia, I'm not sure what the best way to check for hardware failures would be, without shelling out a fair amount of money to my colocation company, and having the server offline for a while. It currently serves 25,000 visitors a day, so I'm hoping it's something I can diagnose while it's running. |
|
Back to top |
|
|
Mantaar Apprentice
Joined: 17 May 2007 Posts: 219
|
Posted: Sat Oct 06, 2007 6:50 pm Post subject: |
|
|
Are there any suspicious messages in /var/log? Like dmesg, lastlog, mail, kernel, faillog, whatever. I would just get the whole subfolder and look into it.
What services are you running on the server? Maybe you could even pastebin ps aux. (Be careful when doing this as command line arguments to certain programs may reveal critical information. This also holds true for posting logs). _________________ Error compiling committee.c: too many arguments to function. |
|
Back to top |
|
|
passive Tux's lil' helper
Joined: 31 Dec 2004 Posts: 105
|
Posted: Sat Oct 06, 2007 8:46 pm Post subject: |
|
|
Here's ps aux, without the Apache2 process. I've gone through everything can think of in /var/log, but I haven't been able to find anything from the time the server freezes.
Code: |
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 2692 584 ? Ss 10:24 0:01 init [3]
root 2 0.0 0.0 0 0 ? S 10:24 0:00 [migration/0]
root 3 0.0 0.0 0 0 ? SN 10:24 0:00 [ksoftirqd/0]
root 4 0.0 0.0 0 0 ? S 10:24 0:00 [watchdog/0]
root 5 0.0 0.0 0 0 ? S 10:24 0:00 [migration/1]
root 6 0.0 0.0 0 0 ? SN 10:24 0:00 [ksoftirqd/1]
root 7 0.0 0.0 0 0 ? S 10:24 0:00 [watchdog/1]
root 8 0.0 0.0 0 0 ? S 10:24 0:00 [migration/2]
root 9 0.0 0.0 0 0 ? SN 10:24 0:00 [ksoftirqd/2]
root 10 0.0 0.0 0 0 ? S 10:24 0:00 [watchdog/2]
root 11 0.0 0.0 0 0 ? S 10:24 0:00 [migration/3]
root 12 0.0 0.0 0 0 ? SN 10:24 0:00 [ksoftirqd/3]
root 13 0.0 0.0 0 0 ? S 10:24 0:00 [watchdog/3]
root 14 0.0 0.0 0 0 ? S< 10:24 0:00 [events/0]
root 15 0.0 0.0 0 0 ? S< 10:24 0:00 [events/1]
root 16 0.0 0.0 0 0 ? S< 10:24 0:00 [events/2]
root 17 0.0 0.0 0 0 ? S< 10:24 0:00 [events/3]
root 18 0.0 0.0 0 0 ? S< 10:24 0:00 [khelper]
root 19 0.0 0.0 0 0 ? S< 10:24 0:00 [kthread]
root 81 0.0 0.0 0 0 ? S< 10:24 0:00 [kblockd/0]
root 82 0.0 0.0 0 0 ? S< 10:24 0:00 [kblockd/1]
root 83 0.0 0.0 0 0 ? S< 10:24 0:00 [kblockd/2]
root 84 0.0 0.0 0 0 ? S< 10:24 0:00 [kblockd/3]
root 85 0.0 0.0 0 0 ? S< 10:24 0:00 [kacpid]
root 158 0.0 0.0 0 0 ? S< 10:24 0:00 [ata/0]
root 159 0.0 0.0 0 0 ? S< 10:24 0:00 [ata/1]
root 160 0.0 0.0 0 0 ? S< 10:24 0:00 [ata/2]
root 161 0.0 0.0 0 0 ? S< 10:24 0:00 [ata/3]
root 162 0.0 0.0 0 0 ? S< 10:24 0:00 [ata_aux]
root 163 0.0 0.0 0 0 ? S< 10:24 0:00 [ksuspend_usbd]
root 166 0.0 0.0 0 0 ? S< 10:24 0:00 [khubd]
root 168 0.0 0.0 0 0 ? S< 10:24 0:00 [kseriod]
root 219 0.0 0.0 0 0 ? S 10:24 0:00 [pdflush]
root 220 0.0 0.0 0 0 ? S 10:24 0:06 [pdflush]
root 221 0.0 0.0 0 0 ? S< 10:24 0:00 [kswapd0]
root 222 0.0 0.0 0 0 ? S< 10:24 0:00 [aio/0]
root 223 0.0 0.0 0 0 ? S< 10:24 0:00 [aio/1]
root 224 0.0 0.0 0 0 ? S< 10:24 0:00 [aio/2]
root 225 0.0 0.0 0 0 ? S< 10:24 0:00 [aio/3]
root 863 0.0 0.0 0 0 ? S< 10:24 0:00 [scsi_eh_0]
root 864 0.0 0.0 0 0 ? S< 10:24 0:00 [scsi_eh_1]
root 888 0.0 0.0 0 0 ? S< 10:24 0:00 [scsi_eh_2]
root 889 0.0 0.0 0 0 ? S< 10:24 0:00 [scsi_eh_3]
root 890 0.0 0.0 0 0 ? S< 10:24 0:00 [scsi_eh_4]
root 891 0.0 0.0 0 0 ? S< 10:24 0:00 [scsi_eh_5]
root 923 0.0 0.0 0 0 ? S< 10:24 0:00 [kpsmoused]
root 932 0.0 0.0 0 0 ? S< 10:24 0:01 [md3_raid1]
root 935 0.0 0.0 0 0 ? S< 10:24 0:00 [md2_raid1]
root 938 0.0 0.0 0 0 ? S< 10:24 0:02 [md1_raid1]
root 940 0.0 0.0 0 0 ? S< 10:24 0:00 [md0_raid1]
root 942 0.0 0.0 0 0 ? S< 10:24 0:00 [reiserfs/0]
root 943 0.0 0.0 0 0 ? S< 10:24 0:00 [reiserfs/1]
root 944 0.0 0.0 0 0 ? S< 10:24 0:00 [reiserfs/2]
root 945 0.0 0.0 0 0 ? S< 10:24 0:00 [reiserfs/3]
root 1041 0.0 0.0 7176 644 ? S<s 10:24 0:00 /sbin/udevd --daemon
root 3871 0.0 0.0 7400 596 ? Ss 10:25 0:00 /usr/sbin/syslog-ng
mysql 4405 2.7 0.8 275396 34520 ? Ssl 10:25 5:03 /usr/sbin/mysqld --defaults-file=/etc/mysql/my.cnf --basedir=/usr --datadir=/var/lib/mysql --socket=/var/run/mysqld/mysqld.sock
root 4473 0.0 0.0 19184 852 ? Ss 10:25 0:00 /usr/sbin/sshd
root 4529 0.0 0.0 2732 404 ? Ss 10:25 0:00 /usr/bin/svscan /service
root 4533 0.0 0.0 2552 372 ? S 10:25 0:00 supervise qmail-pop3d
root 4534 0.0 0.0 2576 368 ? S 10:25 0:00 supervise log
root 4535 0.0 0.0 2568 368 ? S 10:25 0:00 supervise qmail-smtpd
root 4536 0.0 0.0 2564 368 ? S 10:25 0:00 supervise log
qmaild 4537 0.0 0.0 3816 588 ? S 10:25 0:00 /usr/bin/tcpserver -p -v -R -x /etc/tcprules.d/tcp.qmail-smtp.cdb -c 40 -u 201 -g 200 0.0.0.0 smtp /var/qmail/bin/qmail-smtpd
qmaill 4538 0.0 0.0 2704 428 ? S 10:25 0:00 /usr/bin/multilog t s2500000 n10 /var/log/qmail/qmail-smtpd
qmaill 4539 0.0 0.0 2708 424 ? S 10:25 0:00 /usr/bin/multilog t s2500000 n10 /var/log/qmail/qmail-pop3d
root 4540 0.0 0.0 2572 368 ? S 10:25 0:00 supervise tinydns
root 4541 0.0 0.0 2536 368 ? S 10:25 0:00 supervise log
root 4542 0.0 0.0 2544 368 ? S 10:25 0:00 supervise qmail-send
root 4543 0.0 0.0 2556 368 ? S 10:25 0:00 supervise log
tinydns 4544 0.0 0.0 2816 440 ? S 10:25 0:00 /usr/bin/tinydns
dnslog 4545 0.0 0.0 2704 424 ? S 10:25 0:00 multilog t ./main
root 4546 0.0 0.0 3812 576 ? S 10:25 0:00 /usr/bin/tcpserver -p -v -x /etc/tcprules.d/tcp.qmail-pop3.cdb -c 40 0.0.0.0 pop3 /var/qmail/bin/qmail-popup server1.hostname.com /var/vpopmail/bin/vchkpw /var/qmail/bin/qmail-pop3d .maildir
root 4550 0.0 0.0 2560 368 ? S 10:25 0:00 supervise axfrdns
root 4551 0.0 0.0 2580 368 ? S 10:25 0:00 supervise log
root 4552 0.0 0.0 2540 368 ? S 10:25 0:00 supervise dnscache
root 4553 0.0 0.0 2564 368 ? S 10:25 0:00 supervise log
root 4554 0.0 0.0 2600 388 ? S 10:25 0:00 tcpserver -vDRHl0 -x tcp.cdb -- xxx.xxx.xxx.xxx 53 /usr/bin/axfrdns
dnslog 4555 0.0 0.0 2676 424 ? S 10:25 0:00 multilog t ./main
qmaill 4556 0.0 0.0 2684 428 ? S 10:25 0:00 /usr/bin/multilog t s2500000 n10 /var/log/qmail/qmail-send
qmails 4557 0.0 0.0 2720 480 ? S 10:25 0:00 qmail-send
dnscache 4559 0.0 0.0 4332 1996 ? S 10:25 0:00 /usr/bin/dnscache
dnslog 4560 0.0 0.0 2716 428 ? S 10:25 0:00 multilog t ./main
root 4565 0.0 0.0 2700 412 ? S 10:25 0:00 qmail-lspawn # Uncomment the next line for .forward support?#|dot-forward .forward?./.maildir/
qmailr 4566 0.0 0.0 2704 484 ? S 10:25 0:00 qmail-rspawn
qmailq 4567 0.0 0.0 2704 404 ? S 10:25 0:00 qmail-clean
proftpd 4827 0.0 0.0 22776 1448 ? Ss 10:25 0:00 proftpd: (accepting connections)
root 4886 0.0 0.0 10300 744 ? Ss 10:25 0:00 /usr/sbin/cron
root 5063 0.0 0.1 35612 4676 ? Ss 10:25 0:00 /usr/bin/python2.3 /usr/lib/zope-2.8.1/lib/python/zdaemon/zdrun.py -S /usr/lib/zope-2.8.1/lib/python/Zope2/Startup/zopeschema.xml -b 10 -d -s /var/lib/zope/zope-ud/var/zopectlsock -x 0,2 -z /var/lib/zope/zope-ud /var/lib/zope/zope-ud/bin/runzope
zope-ud 5064 0.0 1.9 191288 80496 ? Sl 10:25 0:10 /usr/bin/python2.3 /usr/lib/zope-2.8.1/lib/python/Zope2/Startup/run.py -C /var/lib/zope/zope-ud/etc/zope.conf
root 5124 0.0 0.0 5936 780 tty1 Ss+ 10:25 0:00 /sbin/agetty 38400 tty1 linux
root 5126 0.0 0.0 5892 784 tty2 Ss+ 10:25 0:00 /sbin/agetty 38400 tty2 linux
root 5127 0.0 0.0 5944 780 tty3 Ss+ 10:25 0:00 /sbin/agetty 38400 tty3 linux
root 5130 0.0 0.0 5888 780 tty4 Ss+ 10:25 0:00 /sbin/agetty 38400 tty4 linux
root 5131 0.0 0.0 5908 780 tty5 Ss+ 10:25 0:00 /sbin/agetty 38400 tty5 linux
root 5132 0.0 0.0 5924 780 tty6 Ss+ 10:25 0:00 /sbin/agetty 38400 tty6 linux
root 5746 0.0 0.0 31236 2708 ? Ss 11:42 0:00 sshd: username [priv]
1001 5767 0.0 0.0 31540 1604 ? S 11:42 0:00 sshd: username@pts/0
1001 5769 0.0 0.0 11956 2188 pts/0 Ss 11:42 0:00 -bash
root 6296 0.0 0.0 22072 1336 pts/0 S 11:43 0:00 su
root 6329 0.0 0.0 11720 2212 pts/0 S+ 11:43 0:00 bash
root 9546 0.0 0.0 31208 2708 ? Ss 13:29 0:00 sshd: username [priv]
1001 9555 0.0 0.0 31344 1392 ? S 13:29 0:00 sshd: username@pts/1
1001 9559 0.0 0.0 11988 2188 pts/1 Ss 13:29 0:00 -bash
root 9600 0.0 0.0 22088 1332 pts/1 S 13:29 0:00 su
root 9613 0.0 0.0 11700 2176 pts/1 S 13:29 0:00 bash
root 9817 0.0 0.0 8820 1076 pts/1 R+ 13:30 0:00 ps aux
|
|
|
Back to top |
|
|
Abraxas l33t
Joined: 25 May 2003 Posts: 814
|
Posted: Sun Oct 07, 2007 2:10 pm Post subject: |
|
|
passive wrote: | It's definately a possibility, unfortunately, but the fact it occurred during an emerge makes me somewhat suspect. Also, it seems to freeze regardless of the load level. Usually I associate hardware failures with freezing under certain loads. Finally, since the server is in San Jose, and I'm in Nova Scotia, I'm not sure what the best way to check for hardware failures would be, without shelling out a fair amount of money to my colocation company, and having the server offline for a while. It currently serves 25,000 visitors a day, so I'm hoping it's something I can diagnose while it's running. |
Hardware failures can cause problems under any load especially if the memory is failing. |
|
Back to top |
|
|
passive Tux's lil' helper
Joined: 31 Dec 2004 Posts: 105
|
Posted: Sun Oct 14, 2007 6:21 pm Post subject: |
|
|
Ok, interesting addendum. I rebuilt PHP4, which depends on expat, and it almost seems to have solved the problem. Whereas I was lucky to get 6 hours of uptime before, it's now only gone down once in the last week. Even that once is a concern, however. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|