Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Kernel Panics suddenly start to happen...
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Neilo
n00b
n00b


Joined: 15 Apr 2005
Posts: 15
Location: Shiremoor, UK

PostPosted: Sun Jul 17, 2005 3:07 pm    Post subject: Kernel Panics suddenly start to happen... Reply with quote

Today my machine has started crashing randomly - halting as if someone hit the "pause" button, whatever was on the screen before is fine. Occasionally it hangs at the initial bios screen aswell, but one time trying to boot up it hung, and left me a friendly note:

cpu 0: machine check exception: 4 bank 4: b200001000010c0f
tsc 1872ea1dc0
kernel panic - not syncing: machine check

Does anyone here know what this means? I thought the crashing was originally my graphics card, but I think it could be my RAM now, running memtest86 atm to find out. Any help appreciated :-)

System: AMD64 3000+ Newcastle, 2x512 PC2700 Corsair, Nvidia GF6600GT, Abit KV8Pro Motherboard, running Gentoo, kernel 2.6.11.
_________________
My All-purpose Gaming PC and Web/fileserver: A64 3000+ on a Abit NF8, 2x512MB Corsair RAM, Geforce 6600GT. Running on Gentoo, formatted my Windows partition in a fit of rage.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54810
Location: 56N 3W

PostPosted: Sun Jul 17, 2005 3:41 pm    Post subject: Reply with quote

Neilo,

Sounds like hardware problems or overheating.

Is it any better if you take the lid off ?
Can you see the fans running ?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Apopatos
Guru
Guru


Joined: 17 Oct 2004
Posts: 512
Location: Hellas

PostPosted: Sun Jul 17, 2005 3:41 pm    Post subject: Reply with quote

As far as I know, machine check is an option of the kernel which, automatically, turn off the system if overheated.
Check your Bios about temperature, maybe the fan is problematic and your proccessor begins to work in high temperatures.
Back to top
View user's profile Send private message
Neilo
n00b
n00b


Joined: 15 Apr 2005
Posts: 15
Location: Shiremoor, UK

PostPosted: Sun Jul 17, 2005 3:45 pm    Post subject: Reply with quote

my system temp is 45degC, CPU temp is 47degC - WITH the case open, in BIOS just after the panic, all fans are running at full: CPU, case rear, case front, gfx card, PSU fans and harddrive fans. The room is pretty warm, although the pc has survived in hotter times. memtest86 hung first time through, going throught it again to check the memory over. Any suggestions on what I should do, possibly without the need to buying watercooling :-P but any suggestion is great. Also, the CPU has not been overclocked, its been left alone untweaked.
_________________
My All-purpose Gaming PC and Web/fileserver: A64 3000+ on a Abit NF8, 2x512MB Corsair RAM, Geforce 6600GT. Running on Gentoo, formatted my Windows partition in a fit of rage.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54810
Location: 56N 3W

PostPosted: Sun Jul 17, 2005 4:23 pm    Post subject: Reply with quote

Neilo,

With the power off at the wall socket, make sure the memory modules and PCI cards are properly seated.
If you can remove one or more sticks of memory and the PC will still work, try each stick on its own.

Do not attempt to reseat the processor. You will need new heatsink comppound if you disturb it.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Neilo
n00b
n00b


Joined: 15 Apr 2005
Posts: 15
Location: Shiremoor, UK

PostPosted: Sun Jul 17, 2005 7:38 pm    Post subject: Reply with quote

It seems to be ok now, i've set a fan up to pull air in from outside and blow it in the general direction of the pc, plus my room has cooled down a fair bit, it *seems* to be stable again, memtest seemed fine, but I'm going to invest in some better cooling for my system, bigger CPU heatsink and some RAM with a heatsink. Best be safe than sorry its the only decent working system I have :-)
_________________
My All-purpose Gaming PC and Web/fileserver: A64 3000+ on a Abit NF8, 2x512MB Corsair RAM, Geforce 6600GT. Running on Gentoo, formatted my Windows partition in a fit of rage.
Back to top
View user's profile Send private message
Neilo
n00b
n00b


Joined: 15 Apr 2005
Posts: 15
Location: Shiremoor, UK

PostPosted: Tue Jul 19, 2005 8:49 am    Post subject: Reply with quote

Ok, its started happening again...but I managed to catch what one of the kernel panics said on boot, and fed it into mcelog:

CPU 0 4 northbridge TSC 1872ea1dc0
Northbridge CRC error
link number = 1
bit57 = processor context corrupt
bit61 = error uncorrected
bus error 'local node observed, request didn't time out
generic error mem transaction
generic access, level generic'
STATUS b200001000010c0f MCGSTATUS 4
Kernel panic - not syncing: Machine check

I really hope this doesn't mean a new motherboard, am I right in thinking I need a new mobo, or should I stick a bigger cooler on my northbridge?

Edit: Ordered a chipset cooler with a fan for my NB, and an exhaust for my GFX card, lets see if this solves the problem.
_________________
My All-purpose Gaming PC and Web/fileserver: A64 3000+ on a Abit NF8, 2x512MB Corsair RAM, Geforce 6600GT. Running on Gentoo, formatted my Windows partition in a fit of rage.
Back to top
View user's profile Send private message
Apopatos
Guru
Guru


Joined: 17 Oct 2004
Posts: 512
Location: Hellas

PostPosted: Tue Jul 19, 2005 1:10 pm    Post subject: Reply with quote

Try to low the Hz of RAM through the BIOS. I had a motherboard which had similar problems. I believe it will work after that. Ok in fewer Hz but at least it will work :wink:
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54810
Location: 56N 3W

PostPosted: Tue Jul 19, 2005 6:21 pm    Post subject: Reply with quote

Neilo,

The Northbridge mediates the CPU/Memory transfers. The message means that it went wrong somewhere but was detected and found to be uncorrectable.

That narrows it down to The CPU, the Northbride, the memory, or maybe the PSU.
Unfortunately, the message does not appear to give the address of teh failed transaction.
Code:
bit57 = processor context corrupt
means the CPU has just done a task switch and what was read from memory did not make sense.

Unless you have ECC memory and a motherboard to make use of it, most errors like this will go undetected. You may find you get things crashing with a SIG 11 (SIGSEGV) too.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Neilo
n00b
n00b


Joined: 15 Apr 2005
Posts: 15
Location: Shiremoor, UK

PostPosted: Wed Jul 20, 2005 11:52 pm    Post subject: Reply with quote

Lowering by a few MHz didn't work at all. I could only drop it by 4 - from 204Mhz to 200Mhz. I'm worried, I hope its just my motherboard.
_________________
My All-purpose Gaming PC and Web/fileserver: A64 3000+ on a Abit NF8, 2x512MB Corsair RAM, Geforce 6600GT. Running on Gentoo, formatted my Windows partition in a fit of rage.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum