Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Computer repeatedly crashes.
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
johntramp
Guru
Guru


Joined: 03 Feb 2004
Posts: 457
Location: New Zealand

PostPosted: Sun Oct 03, 2004 7:17 pm    Post subject: Computer repeatedly crashes. Reply with quote

Hi, I am not sure what is causing this, but every so often my computer will crash to the point where nothing will respond and ssh is not open. It is happening fairly often but I can see no pattern to it. I can compile programs for hours with no problem and then it will crash when using the internet with no other processes running. It has happend from a knoppix live cd aswell as my gentoo install so that means it is a hardware fault, right? I have run memtest86 for 13 hours with no errors, is this test 100% reliable? Are there any other similar tests or anything else I can try to figure out the problem?

Thanks
-John
Back to top
View user's profile Send private message
nlightn
Apprentice
Apprentice


Joined: 16 Sep 2003
Posts: 171

PostPosted: Sun Oct 03, 2004 7:35 pm    Post subject: Reply with quote

What kind of hardware are you running? What kind of "crash" are you experiencing? Hard locks? Random power-offs? I'd be particularly suspicious of the power supply and/or hard disk.
Back to top
View user's profile Send private message
solarium_rider
Tux's lil' helper
Tux's lil' helper


Joined: 23 Jun 2003
Posts: 88
Location: San Francisco

PostPosted: Sun Oct 03, 2004 8:11 pm    Post subject: Reply with quote

I too have been experience this lately. Sometimes it'll happen while idle, sometimes while compiling, sometimes while browsing. I can't really seem to figure it out. I've had a few crashes that actually spit out errors, and the error was related to an "Interrupt," so i'm think it's kernel related. I didn't write it down, so I forget the exact error message.

Last night i went to recompile firefox and checked it about 15 minutes later and the video was completely corrupted and everything was locked up. Typically when it locks the cpu also shoots to 100% which isn't to weird I suppose for a lock. If I have sound playing sometimes it will keep playing for a short bit, but eventually stop. I'm not sure how long though, I usually fall asleep streaming music and someties i'll wake up and it'll be dead.

I think we should compile a list of people having the same issues and their relevent hardware/software configurations and see if we can find a pattern.

Hardware:
cpu - x86 athlon xp
video - nvidia
sound - audigy
input devices - gravis joystick, usb mouse
other - usb printer (thought it was crashing before this was connected)

Software:
kernel - 2.6.8.1-ck8 (w/ power management enabled)
X - xorg-x11 6.7.0-r2
video - nvidia drivers
wm - fluxbox
browser - firefox

configs:
CFLAGS="-march=athlon-xp -m3dnow -msse -mfpmath=sse -mmmx -Os
-pipe -fforce-addr -fomit-frame-pointer -frerun-cse-after-loop
-frerun-loop-opt -maccumulate-outgoing-args -ffast-math"
USE="3dnow mmx -nls sse -kde -gnome tiff -arts alsa mozilla dvd dvdr divx4linux
joystick xvid directfb fbcon -cups -wmf -tcpd -esd cups ppds usb"

I also have the "cool" bit set on the kt333 chipset to allow the HALT cpu cmd when idling.

I'm quite suspect of the kernel/kernel configuration, seems 2.6.8 and above started crashing alot.
Back to top
View user's profile Send private message
nxsty
Veteran
Veteran


Joined: 23 Jun 2004
Posts: 1556
Location: .se

PostPosted: Sun Oct 03, 2004 8:25 pm    Post subject: Reply with quote

Update your kernel to ck9! Ck8 has a bug that can cause crashes.
Back to top
View user's profile Send private message
johntramp
Guru
Guru


Joined: 03 Feb 2004
Posts: 457
Location: New Zealand

PostPosted: Sun Oct 03, 2004 10:17 pm    Post subject: Reply with quote

I have had some problems in the past with this ram, but it started working again so I left it at that. My setup is as as follows...

Linux odysseus 2.6.9-rc1-love1 #2 SMP Fri Oct 1 15:06:34 NZST 2004 i686 AMD Athlon(tm) XP 2600+ AuthenticAMD GNU/Linux

Albatron KX600s Pro motherboard
athlonxp 2600+
nvidia gfx
onboard via sound
x-org
fluxbox
firefox

...so quite similar to yours solarium_rider


The thing is tho it has been happend when running off knoppix aswell so I think it has nothing to do with my software.

I am going to try installing something like debian sid to see if the problem is still there.
Back to top
View user's profile Send private message
jschellhaass
Guru
Guru


Joined: 20 Jan 2004
Posts: 341

PostPosted: Sun Oct 03, 2004 10:47 pm    Post subject: Reply with quote

What does cat /proc/interrupts show (any conflicts with the video card)?

You may want to try with acpi=off.

jeff
Back to top
View user's profile Send private message
Mben
Guru
Guru


Joined: 29 Mar 2004
Posts: 465
Location: New York, USA

PostPosted: Sun Oct 03, 2004 10:58 pm    Post subject: Reply with quote

do you have any dust in your case? overheats get me every few months if i dont blow out my case (i have cats :) )
good luck
Back to top
View user's profile Send private message
firephoto
Veteran
Veteran


Joined: 29 Oct 2003
Posts: 1612
Location: +48° 5' 23.40", -119° 48' 30.00"

PostPosted: Sun Oct 03, 2004 11:37 pm    Post subject: Reply with quote

I just read about the ck8 problem on cons mailing list. Seems that was what hit me at random sometimes. I also had this happen with ck7 once or twice but nothing in the logs, just a lockup. It's related to having preempt turned on in the kernel I guess. Is preemempt good or bad, my system seems "quicker" with preempt on.

I ended up going back to plain 2.6.8.1 without reiser4 for now since the newer rc's have nvidia issues. :(
Back to top
View user's profile Send private message
johntramp
Guru
Guru


Joined: 03 Feb 2004
Posts: 457
Location: New Zealand

PostPosted: Mon Oct 04, 2004 12:05 am    Post subject: Reply with quote

jschellhaass wrote:
What does cat /proc/interrupts show (any conflicts with the video card)?

You may want to try with acpi=off.

jeff
Quote:
[root /home/john]$ cat /proc/interrupts
CPU0
0: 224314 IO-APIC-edge timer
1: 127 IO-APIC-edge i8042
9: 0 IO-APIC-level acpi
12: 7647 IO-APIC-edge i8042
14: 11429 IO-APIC-edge ide0
15: 1435 IO-APIC-edge ide1
17: 2413 IO-APIC-level eth0
20: 1241 IO-APIC-level libata
21: 0 IO-APIC-level ehci_hcd, uhci_hcd, uhci_hcd, uhci_hcd, uhci_hcd
22: 0 IO-APIC-level via82cxxx
NMI: 0
LOC: 224316
ERR: 0
MIS: 0
how would I go about trying with acpi=off ?
What does that mean / do?
Back to top
View user's profile Send private message
johntramp
Guru
Guru


Joined: 03 Feb 2004
Posts: 457
Location: New Zealand

PostPosted: Mon Oct 04, 2004 12:06 am    Post subject: Reply with quote

Mben wrote:
do you have any dust in your case? overheats get me every few months if i dont blow out my case (i have cats :) )
good luck
I dont think it is heat because my bios has a warning / alarm ~5 degres before the hard shutdown. I will look into that tho, maybe set up the lm_sensors or whatever it is that measures the temps.
Back to top
View user's profile Send private message
jschellhaass
Guru
Guru


Joined: 20 Jan 2004
Posts: 341

PostPosted: Mon Oct 04, 2004 1:04 am    Post subject: Reply with quote

I don't see nvidia listed anywhere. If you run cat /proc/interrupts within a terminal under X the nvidia card should show up on one of the interrupts. I'm just wondering if you have a IRQ conflict between the nvidia card and something else.

In order to boot without acpi add acpi=off to the kernel line of the boot manager. In grub.conf it would be something like this.

Code:

kernel /bzImage-2.6.8-gentoo-r5 root=/dev/hde3 vga=791 acpi=off


jeff
Back to top
View user's profile Send private message
Mben
Guru
Guru


Joined: 29 Mar 2004
Posts: 465
Location: New York, USA

PostPosted: Mon Oct 04, 2004 1:31 am    Post subject: Reply with quote

johntramp wrote:
Mben wrote:
do you have any dust in your case? overheats get me every few months if i dont blow out my case (i have cats :) )
good luck
I dont think it is heat because my bios has a warning / alarm ~5 degres before the hard shutdown. I will look into that tho, maybe set up the lm_sensors or whatever it is that measures the temps.


blow it out anyway. my bios never classifies it as overheated but the computer locks anyway. just take some compressed air or a vaccume (air works better usually but be carefull not to use too high a pressure or let the fans overspeed) to the fans and vets

good luck
Back to top
View user's profile Send private message
johntramp
Guru
Guru


Joined: 03 Feb 2004
Posts: 457
Location: New Zealand

PostPosted: Mon Oct 04, 2004 2:48 am    Post subject: Reply with quote

I have had a look and there was a little dust that has been through the cpu fan and been blown onto the ram, I have vaccumed this out and a little around the fans. Also I have swapped the ram with another computer to see how that goes.

Quote:
I don't see nvidia listed anywhere. If you run cat /proc/interrupts within a terminal under X the nvidia card should show up on one of the interrupts. I'm just wondering if you have a IRQ conflict between the nvidia card and something else.
Does me not having installed the nvidia drivers yet affect this? I am still running the 2d nv drivers that came with the install.

I will try that kernel line aswell, see if that makes a difference too.

Thanks
Back to top
View user's profile Send private message
johntramp
Guru
Guru


Joined: 03 Feb 2004
Posts: 457
Location: New Zealand

PostPosted: Mon Oct 04, 2004 7:25 am    Post subject: Reply with quote

I have just noticed, when the computer 'hung' that the music I was listening to, about ~1min later started again for a second or so, and then stopped. This would happen about once every minute or so, so I was able to reboot the computer without a hard boot. This too is still happening in knoppix even with the ram being replaced.

Reading from the bios, after the computer being idle for hours
Quote:
System temp: 28degres C / 82degres F
CPU temp: 35degres C / 95degres F
Any possibilities on what else this could be ??
Back to top
View user's profile Send private message
johntramp
Guru
Guru


Joined: 03 Feb 2004
Posts: 457
Location: New Zealand

PostPosted: Mon Oct 04, 2004 9:38 am    Post subject: Reply with quote

I realised that I had put a UPS in line with the computer about a week ago, the same time since this started happening. I have now moved it out of the way and things seem to be looking up.

I had the computer and ups feeding the computer, maybe that was just too much for it ?

I will see how it is in a couple of hours... hopefully it is sorted :D

Thanks for your help if so.
Back to top
View user's profile Send private message
Incabulos
n00b
n00b


Joined: 14 Apr 2003
Posts: 28
Location: Sydney, Australia

PostPosted: Mon Oct 04, 2004 11:46 am    Post subject: Reply with quote

Sounds like your CPU is operating at a pretty normal temperature, the shutdowns/crashing certainly isnt caused by it overheating.

I'd check the load on your UPS too, most have a serial cable via which you can monitor load, run time remaining, uptime, and so on. If its overloaded then power will fluctuate to all connected devices in a fairly bad way I assume, sudden shutdowns or lockups might be a power problem.

'dmesg | tail' will show you the last events the kernel has seen, this might help in diagnosing things. You might also want to tone down the more aggressive compiler optimisations in your make.conf too if they are set, and recompile the most crucial components with the more conservative settings ( glibc & kernel come to mind ).

HTH.
Back to top
View user's profile Send private message
Incabulos
n00b
n00b


Joined: 14 Apr 2003
Posts: 28
Location: Sydney, Australia

PostPosted: Mon Oct 04, 2004 11:48 am    Post subject: Reply with quote

Sounds like your CPU is operating at a pretty normal temperature, the shutdowns/crashing certainly isnt caused by it overheating.

I'd check the load on your UPS too, most have a serial cable via which you can monitor load, run time remaining, uptime, and so on. If its overloaded then power will fluctuate to all connected devices in a fairly bad way I assume, sudden shutdowns or lockups might be a power problem.

'dmesg | tail' will show you the last events the kernel has seen, this might help in diagnosing things. You might also want to tone down the more aggressive compiler optimisations in your make.conf too if they are set, and recompile the most crucial components with the more conservative settings ( glibc & kernel come to mind ).

HTH.
Back to top
View user's profile Send private message
Mben
Guru
Guru


Joined: 29 Mar 2004
Posts: 465
Location: New York, USA

PostPosted: Mon Oct 04, 2004 8:49 pm    Post subject: Reply with quote

if you have a regular powerstrip try just taking the ups out of the system
Back to top
View user's profile Send private message
johntramp
Guru
Guru


Joined: 03 Feb 2004
Posts: 457
Location: New Zealand

PostPosted: Mon Oct 04, 2004 11:08 pm    Post subject: Reply with quote

yes, I have taken the ups out, and now it has been up for about 4 hours and it seems to be fine :)

there is no serial port out of the ups, I assume it is fairly old as it was given to me for free.

Thanks for your help,

-John
Back to top
View user's profile Send private message
johntramp
Guru
Guru


Joined: 03 Feb 2004
Posts: 457
Location: New Zealand

PostPosted: Tue Oct 05, 2004 12:45 am    Post subject: Reply with quote

:cry: it's happening again now without the UPS :(
Back to top
View user's profile Send private message
Moloch
Apprentice
Apprentice


Joined: 17 Mar 2003
Posts: 293
Location: Albuquerque, NM, US

PostPosted: Tue Oct 05, 2004 1:06 am    Post subject: Reply with quote

Forever I was having problems with my system crashing when using athcool to set the cool bit for my KT333.
I kept it off, finally one day I got tired of having a hot CPU and listening to that damn temperature sensitive processor fan whine at almost high.
So I spent about a week going through kernel settings and found nothing. Then moved to BIOS settings after I set my BIOS to safe mode defaults, athcool worked. I believe the problems lies in a couple of settings. First the enhance performance setting for both RAM and AGP caused the lockups. Also the CPU decode setting. It has 3, normal, fast, and ultra. Normal and fast work fine, ultra locks it up.
I really don't notice any performance change between all these settings, so I'm happy to have found the issue.
I've also heard of some boards turn to the cooling bit on by default and you can use athcool to turn it off and see if that makes a difference.
_________________
Understanding is a three-edged sword: your side, their side, and the truth. --Kosh
1010011010
Back to top
View user's profile Send private message
johntramp
Guru
Guru


Joined: 03 Feb 2004
Posts: 457
Location: New Zealand

PostPosted: Tue Oct 05, 2004 2:24 am    Post subject: Reply with quote

I have had a little look in my bios, I will go and look a little deeper. I have not done any overclocking tho or changed anything like that in the bios.

Another thing is that I can leave the computer on it's own, downloading or compiling or whatever and it is fine. Soon as I jump back on the internet or anything it will lock up again :S
Back to top
View user's profile Send private message
Moloch
Apprentice
Apprentice


Joined: 17 Mar 2003
Posts: 293
Location: Albuquerque, NM, US

PostPosted: Tue Oct 05, 2004 2:49 am    Post subject: Reply with quote

Well if it definately seems internet oriented. Then, how are you connected? Ethernet, dial-up, some usb crap, etc? What drivers are you using? Kernel modules, something from portage, etc?
_________________
Understanding is a three-edged sword: your side, their side, and the truth. --Kosh
1010011010
Back to top
View user's profile Send private message
johntramp
Guru
Guru


Joined: 03 Feb 2004
Posts: 457
Location: New Zealand

PostPosted: Tue Oct 05, 2004 7:17 am    Post subject: Reply with quote

well the thing is I dont think it is software as it happens in knoppix aswell. I can also leave my computer on the internet downloading on DC and that can run flawlessly for hours.
I will try installing another distro somewhere else and see if it still happens there, maybe a stable debian.
Back to top
View user's profile Send private message
johntramp
Guru
Guru


Joined: 03 Feb 2004
Posts: 457
Location: New Zealand

PostPosted: Tue Oct 05, 2004 7:22 am    Post subject: Reply with quote

Incabulos wrote:
'dmesg | tail' will show you the last events the kernel has seen, this might help in diagnosing things.
Quote:
[root /var/log]$ dmesg | tail
ReiserFS: sda1: found reiserfs format "3.6" with standard journal
ReiserFS: sda1: using ordered data mode
ReiserFS: sda1: journal params: device sda1, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: sda1: checking transaction log (sda1)
ReiserFS: sda1: replayed 1 transactions in 0 seconds
ReiserFS: sda1: Using r5 hash to sort names
nvidia: module license 'NVIDIA' taints kernel.
ACPI: PCI interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16
NVRM: loading NVIDIA Linux x86 NVIDIA Kernel Module 1.0-6111 Tue Jul 27 07:55:38 PDT 2004
r8169: eth0: link up
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum