View previous topic :: View next topic |
Author |
Message |
nunogt Tux's lil' helper
Joined: 08 Dec 2004 Posts: 134 Location: Lisbon, Portugal
|
Posted: Tue Sep 27, 2005 8:58 pm Post subject: Weirdest problem - predictable semi random crashes |
|
|
Hi guys, I got the weirdest problem, and despite being extemely annoying, it's being very difficult to understand exactly what's wrong, let alone solve it.
The thing is my server crashes randomly. But not always. This machine is supposed to be on 24/7. It's a proxy server, webserver, and ftp server. I use it mainly for development purposes and to serve as a gateway for my small home network. Nearly every night, after some time idle (which I can know exactly since my screensaver is a clock) the machine dies. Everything stops responding - every daemon, and even KDE and the mouse. It just freezes and stays on wasting power. This behaviour is not constant, though. There are some times (every 4 or 5 reboots) that the machine does not crash and works flawlessly for several days in a row, until I have to reboot it for some reason. Then, in the next boot, the cicle restarts.
One thing I notice when in one of these likely-to-freeze boots is that an 'emerge --sync' is very fast (it takes about 2 minutes syncinc and updating portage cache), while on a stable-boot it always takes 15 or more minutes. There's also the small detail that a vanilla-sources kernel 2.6 won't be able to boot at all on this machine. No idea why, and I see nothing useful in the system messages. The last message displayed when booting is "Initializing CPU#0", followed by intense HD activity. The previous one is something to do with APIC (not ACPI). I find it unlikely to be able to provide you the boot log since the system hangs and I have to restart it. gentoo-sources-2.6 boot fine, though, and that's the one which I'm currently using. I've tested different versions of vanilla-sources and all of them suffer from the same problem.
I suspected this could be a HD problem so I run seatools (a segate maintenance tool) in both of my discs, and it found nothing unusual. I also run memtest which said everything was fine. I'm really lost here.
My specs:
Pentium3
Motherboard with VIA chipset
320 MB RAM
nVidia GeForce2 Ti (using nvidia-kernel and nvidia-glx as explained here)
/ , /boot , and swap space in hda (Seagete 80GB)
/var, /tmp and swap space in hdb (Seagate 20GB)
gentoo-sources-2.6.12-r10 (stable)
no ~arch packages and no weird gcc flags (in fact i'm even using O2)
Wtf should I do?
Last edited by nunogt on Wed Sep 28, 2005 8:22 am; edited 2 times in total |
|
Back to top |
|
|
PaulBredbury Watchman
Joined: 14 Jul 2005 Posts: 7310
|
Posted: Tue Sep 27, 2005 9:33 pm Post subject: Re: Weirdest problem - predictable semi random crashes |
|
|
nunogt wrote: | no weird gcc flags (in fact i'm even using O2) |
You should be using -O2. Show your CFLAGS line from /etc/make.conf.
Do you use hdparm to potentially force the hard drive settings beyond their safe limits?
I'd recommend you run "make menuconfig", and play with some of the kernel settings, especially BLK_DEV_VIA82CXXX (hard drive support). |
|
Back to top |
|
|
wally.hall n00b
Joined: 26 Sep 2005 Posts: 55 Location: England
|
Posted: Tue Sep 27, 2005 9:49 pm Post subject: Set a cron job running |
|
|
Try setting a Cron job running to FTP all your logs to another computer say ever 5 minutes, then set the machine going and see what comes out nearest the lockup (that way you don't have to know in advanced when it goes down). If it is the HD, you may well see a lot of I/O errors or something.
Also, is your Swap OK? If there's some bad sectors on the HD, and the Swap is getting b0rk3d, anything could happen!
Also, check there's nothing odd in your BIOS turned on, no BIOS shadowing or anything, and check your CPU is clocked correctly.
Otherwise, I can't think of anything obvious as to why it'd be acting the way it is! I take it the machine is running KDE and the clock screensaver is freezing at lockups? Maybe not starting X, if it's something related to your Video hardware or something.
Hope you figure it out, problems arn't such a bother when you know what's causing it and you're getting somewhere to fixing it, I know how your feeling!
Take care,
wally _________________ I like Gentoo why?
Because it works how I want it to work. |
|
Back to top |
|
|
nunogt Tux's lil' helper
Joined: 08 Dec 2004 Posts: 134 Location: Lisbon, Portugal
|
Posted: Tue Sep 27, 2005 11:31 pm Post subject: Re: Weirdest problem - predictable semi random crashes |
|
|
PaulBredbury wrote: | Show your CFLAGS line from /etc/make.conf.
Do you use hdparm to potentially force the hard drive settings beyond their safe limits?
I'd recommend you run "make menuconfig", and play with some of the kernel settings, especially BLK_DEV_VIA82CXXX (hard drive support). |
My CFLAGS section from make.conf:
Code: | CHOST="i686-pc-linux-gnu"
CFLAGS="-march=pentium3 -O2 -pipe -fomit-frame-pointer"
CXXFLAGS="${CFLAGS}" |
In fact, I did a little research before compiling my stage1 and used the exact same site you mentioned as a reference.
Now, about hdparm. I know I have DMA enabled in my HDDs, but i'm not loading hdparm at boot (rc-update add hdparm default) because my performance is quite good already.
I'll have a look at the kernel setting you mentioned. Thanks for the tip.
wally.hall wrote: | Try setting a Cron job running to FTP all your logs to another computer say ever 5 minutes, then set the machine going and see what comes out nearest the lockup (that way you don't have to know in advanced when it goes down). If it is the HD, you may well see a lot of I/O errors or something.
Also, is your Swap OK? If there's some bad sectors on the HD, and the Swap is getting b0rk3d, anything could happen!
Also, check there's nothing odd in your BIOS turned on, no BIOS shadowing or anything, and check your CPU is clocked correctly.
Otherwise, I can't think of anything obvious as to why it'd be acting the way it is! I take it the machine is running KDE and the clock screensaver is freezing at lockups? Maybe not starting X, if it's something related to your Video hardware or something. |
Great suggestions, thanks. What logs do you reckon I should monitor? /var/log/messages and anything else? I'll also remove xdm from the default runlevel and leave the computer on, so I can check if it still crashes. I hope it does, cause that would probably mean I have something enabled in the kernel I shouldn't. But then again, this behaviour may not occur if the machine is not being stressed enough with X and KDE. Don't know. I'll report here the results later.
Thanks again. |
|
Back to top |
|
|
wally.hall n00b
Joined: 26 Sep 2005 Posts: 55 Location: England
|
Posted: Tue Sep 27, 2005 11:52 pm Post subject: Anything and everything! |
|
|
I'd just log everything you can find which might be relevant! Write a lil script to copy the output of "ps aux" and "top", also copy all of your /var/log, then you've got everything available. I'm a bit tired right now (nearly 00:01am!), so I'll do some thinking and research tomorrow and post back what logs I think will be of use for us to look at. Feel free to E-Mail me a copy of your logs (just tar them up and E-Mail them, don't worry about the filesize) if you get a crash before I post on here again, I'll happily have a look!
Take care,
wally _________________ I like Gentoo why?
Because it works how I want it to work. |
|
Back to top |
|
|
nosatalian Tux's lil' helper
Joined: 09 Apr 2004 Posts: 98
|
Posted: Wed Sep 28, 2005 5:43 am Post subject: What graphics? |
|
|
Are you using a radeon card with the ATI drivers? I've had issues like this with ATI cards using fglrx drivers, particularly when in screensavers (even non-3d ones) |
|
Back to top |
|
|
nunogt Tux's lil' helper
Joined: 08 Dec 2004 Posts: 134 Location: Lisbon, Portugal
|
Posted: Wed Sep 28, 2005 8:20 am Post subject: Re: Anything and everything! |
|
|
wally.hall wrote: | if you get a crash before I post on here again, I'll happily have a look! |
Of course I did. I just rebooted. It crashed tonight a couple of minutes before 04:00 AM
nosatalian wrote: | Are you using a radeon card with the ATI drivers? I've had issues like this with ATI cards using fglrx drivers, particularly when in screensavers (even non-3d ones) |
Sorry, forgot to mention I'm using nVidia. Ironically I chose this brand because because of the stability factor - I heard ATi tend to produce more heat and its linux drivers are more unstable, and don't support all the accelerations the nvidia driver does. |
|
Back to top |
|
|
jschellhaass Guru
Joined: 20 Jan 2004 Posts: 341
|
Posted: Wed Sep 28, 2005 4:20 pm Post subject: |
|
|
Have you tried memtest86 and/or CPUburn?
jeff |
|
Back to top |
|
|
nunogt Tux's lil' helper
Joined: 08 Dec 2004 Posts: 134 Location: Lisbon, Portugal
|
Posted: Wed Sep 28, 2005 6:16 pm Post subject: |
|
|
jschellhaass wrote: | Have you tried memtest86 and/or CPUburn?
jeff |
Yes, I've tried memtest86 and it reported nothing wrong. It took more than 7 hours, IIRC. Will google for CPUBurn, never heard of it before.
I did some major changes to my system in order to narrow the problem. I changed /var and /tmp from my second HD (which is possibly less reliable) and I have only one HD now, which is working properly and roughly 2 weeks old. I've also replaced my power supply which was rather old and noisy (bought this one. What this means, essentially, is that my system now doesn't have any possibly faulty hardware, and it'll be easier to understand what is making it crash.
Last edited by nunogt on Wed Sep 28, 2005 7:17 pm; edited 2 times in total |
|
Back to top |
|
|
jschellhaass Guru
Joined: 20 Jan 2004 Posts: 341
|
Posted: Wed Sep 28, 2005 6:51 pm Post subject: |
|
|
you can just emerge cpuburn.
jeff |
|
Back to top |
|
|
darkdigger n00b
Joined: 11 Aug 2005 Posts: 1
|
Posted: Thu Sep 29, 2005 10:14 am Post subject: Me too! |
|
|
I've been having the exact same problem as the original poster. About two weeks ago, my server (apache, ssh, sql, folding@home, no GUI) started randomly freezing up. No ping responses, no monitor output, no ssh, no keyboard response, nada. At first, I thought it may have had to do with the update to openssh I emerged around that time, but I disabled the sshd and the system still froze up. I disabled single services at a time and then moved on to disabling multiple services to determine the culprit, but it kept freezing. Next, I updated my kernel form 2.6.11-gentoo to 2.6.12-gentoo-r10 (genkernel) and it still kept freezing.
As of this evening, the server won't even POST anymore, so I guess hardware is the culprit. Not exactly sure what piece, but I'm betting it's either the CPU or motherboard. Hopefully, my bad luck will help you diagnose ur problem. The server is only 9 months old too. *sigh*
Best,
Arash
P.S. Here are the server specs for whats its worth
Athlon 2400
1GB RAM
MB Video
MB Ethernet
All additional motherboard features disabled (extra ports, devices, etc)
200GB Western Digital
O2 and no ~arch packages |
|
Back to top |
|
|
nunogt Tux's lil' helper
Joined: 08 Dec 2004 Posts: 134 Location: Lisbon, Portugal
|
Posted: Fri Sep 30, 2005 9:29 am Post subject: |
|
|
Downgrading from 2.6.12 to 2.6.9 without recycling the old settings did the trick. Tonight it didn't freeze.
Let's hope this wasn't a mere coincidence. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|