Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Random computer lockups
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
panderiz
n00b
n00b


Joined: 10 Dec 2008
Posts: 50

PostPosted: Mon Sep 20, 2010 11:45 am    Post subject: Random computer lockups Reply with quote

Lately I"ve been having a lot of computer lockups and can't figure out why. So far I've only ruled out it isn't a temperature issue. Are there any utilities I can install to help me figure out what's causing these crashes? The last thing in the messages log before my last crash was a cron message that is shown every ten minutes. This time around my crash happened while I was sleeping(my desktop was powered off). Would it be possible for computer lock ups/power downs if my keyboards cord had exposed wired and was being short circuited? I'd try to verify this myself or fix it myself but I just can't figure out what causes these crashes
Back to top
View user's profile Send private message
n3r0
n00b
n00b


Joined: 27 Feb 2007
Posts: 25
Location: Western Australia

PostPosted: Tue Sep 21, 2010 4:51 am    Post subject: Reply with quote

Hi Panderiz,

It's good that you have ruled out temperature, as for utilities that you can install, there isn't one utility that I am aware off that would be able to diagnose these lockups.
The logs may not show the source of the lockup unfortunately, as more often than not but the time the system locks up it may already be too late to capture that info.
It is possible for an electrical short to power down/off your machine however it is still reasonably unlikely.

From personal experience I have found that most unexplained lockups tend to be caused by a kernel configuration issue.
There are of course exceptions to this rule but 9/10 times it's due to a dodgy kernel config.

Moving forwards the best bet is to try and eliminate the usual causes by first checking out your hardware for obvious physical damage and if possible running through any diagnostic tools. Ie.. Memtest to check for RAM problems, etc.
Assuming the hardware diagnostics come back fine which they more than likely will, it gets a bit more difficult.

If you are able to intentionally cause the lockup in a repeatable situation it will help to determine/test for the cause.
Such as running a particular program, utilising a specific bit of hardware or maybe stressing the system in general a bit, ie.. video playback, 3D application, compilation, switching multiple programs.

Alternatively you could try running from a livecd for a bit to see if it also suffers from random freezes, as this will help to further distinguish from your own kernel (which I assume is reasonably specific to your machine) and one that is compiled to work on lots of different setups.
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Tue Sep 21, 2010 8:26 am    Post subject: Reply with quote

as n3r0 said: run a livecd to really check hardware against another env
most people have temp issue with cpu, but a bad case with bad cooling can affect others part of your system (video, ram, and even north or southbridge chip)
cables: it's always a bad idea to put an electric cable (specially from bad/cheap psu) close to an unprotect chip for the noise
cheap/bad psu: an about to die psu (or a cheap one) could generate noise & distortion to the electricity that goes beyond atx standard, putting a sword over your components
and also, a too lower psu can fail at providing enough electricity to all components and might fail randomly (the psu itself) or might fail the components randomly (specially under load where everyone wish energy)


if hardware is ok
did you update your kernel ? yes, get back to previous one
did you update programs, specially ones in system, get back to previous versions (look in doc, hint & tips), mask newer ones, and unmask one as one until you caught the culprit
Back to top
View user's profile Send private message
panderiz
n00b
n00b


Joined: 10 Dec 2008
Posts: 50

PostPosted: Tue Sep 21, 2010 12:25 pm    Post subject: Reply with quote

Bah sorry, I forgot to mention these lock ups started happening recently on both my old kernel(which had no problems) and then "2.6.35-gentoo-r5". My friend mentioned how a cheap PSU would cause some issues as well... Wouldn't a faulty/cheap PSU make my computers shut down not just lock up or would it lock up initially and then after some time have no more power? Since I was awake most of the time I just rebooted when I seen I couldn't do anything.
Back to top
View user's profile Send private message
n3r0
n00b
n00b


Joined: 27 Feb 2007
Posts: 25
Location: Western Australia

PostPosted: Tue Sep 21, 2010 2:36 pm    Post subject: Reply with quote

I was just trying to think of how to narrow this down a little.

Personally i've seen faulty PSU's cause some interesting and incredibly random faults however typically they result in the computer either restarting or shutting down completely opposed to locking up.
If you're concerned about the power supply, you can pick up a relatively cheap one (cheap != bad, at least not always) nowadays and try that in the machine.

Are you sure the whole system has locked up and it isn't just X that is frozen?
You can check this by using a second machine to ssh across into the locked up machine.
Assuming X is the problem you should be able to remote in and check out the logs or even give X a kick.
Alternatively you could try disabling "DontZap" in your xorg config (if it's not already) and try hitting ctrl-alt-bkspace to zap the current X session.

The only other thing that I can think to keep an eye on is your available system memory.
You can check this by typing the following into a console.
Code:
free -m

If your system has a particularly nasty memory leak (I used to see lots of these when running KDE) it will eventually fall in a great heap.
Just execute that line every now and then and check if the used mem is increasing by much.

Like Krinn asked, have you recently updated any packages?
You seem to have updated your kernel so is it possible that you also updated some of the other system packages?

Finally were you able to work out if there is something you can do to cause the lockups intentionally?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54317
Location: 56N 3W

PostPosted: Tue Sep 21, 2010 6:15 pm    Post subject: Reply with quote

panderiz,

As well as the PSU in the metal case (get a mid price range one with some spare capacity) you also have a the CPU Vcore PUS on the motherboard.
When this begins to fail you get all sort of strange symptopms from resets, lockups to random errors.

Open the case and look at the tall cylinderical parts close to the CPU. IF the tops are domed or fluid is escaping from the bottom, you have found the problem.
All of these capacitors in the Vcore regulator need to be replaced with high quality parts. There will be 10 or 12 or so.

Low cost motherboards have got selection of these parts down to a fine art, so they fail just outside the warranty period.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
panderiz
n00b
n00b


Joined: 10 Dec 2008
Posts: 50

PostPosted: Tue Sep 21, 2010 8:25 pm    Post subject: Reply with quote

Well the good thing about my desktop environment is I got a system monitor to show me my ram usage/cpu. The lock ups aren't happening cause I'm out of RAM so that isn't an issue. The first few times this happened I tried to SSH into the machine and got no response. I'll poke my head into my case in an hour or so to check. Will report back about that.
No buldged tops or liquids coming out.
I've ran my weekly updates from portage... Though I haven't updated in about a month.... I'll run that now


Last edited by panderiz on Tue Sep 21, 2010 8:59 pm; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54317
Location: 56N 3W

PostPosted: Tue Sep 21, 2010 8:29 pm    Post subject: Reply with quote

panderiz,

When you are out of RAM, the kernel swaps - that does not always mean using your swap partition. Thats only for RAM that has no permanent home on disk. Dynamically allocated RAM.
When you are out of RAM and there is no scope for swapping, the kernel calls the Out Of Memmory (OOM) manager. The OOM kills tasks so that the kernel can continue.
When the OOM kicks in, you tend to notice. More importantly, it does not cause lockups.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Tue Sep 21, 2010 11:20 pm    Post subject: Reply with quote

voltage irregulataries can do lockup.

motherboard can still be alive because capacitors need few secs to be empty, while the cpu itself might not have that chance. You'll endup with a cpu freeze while mb didn't even see a problem (you can see it if you have led on your mb, switch the power off on the psu, led will still be lightening a few before going off)

also, a bad psu can fail to deliver on a line, while others lines keep working.

if you have mb sensors, you can check your voltage and see if they are all under nice values (psu about to die) or moving fast up/down (psu deliver random power) or all ok except cpu voltage (psu have a bad line, or too much power taken on that line)

watch sensors
might gave you nice hints as
Code:
atk0110-acpi-0
Adapter: ACPI interface
Vcore Voltage:      +0.95 V  (min =  +0.80 V, max =  +1.60 V)
 +3.3 Voltage:      +3.33 V  (min =  +2.97 V, max =  +3.63 V)
 +5 Voltage:        +5.24 V  (min =  +4.50 V, max =  +5.50 V)
 +12 Voltage:      +12.14 V  (min = +10.20 V, max = +13.80 V)
Back to top
View user's profile Send private message
panderiz
n00b
n00b


Joined: 10 Dec 2008
Posts: 50

PostPosted: Wed Sep 22, 2010 11:43 am    Post subject: Reply with quote

krinn wrote:

watch sensors
might gave you nice hints as
Code:
atk0110-acpi-0
Adapter: ACPI interface
Vcore Voltage:      +0.95 V  (min =  +0.80 V, max =  +1.60 V)
 +3.3 Voltage:      +3.33 V  (min =  +2.97 V, max =  +3.63 V)
 +5 Voltage:        +5.24 V  (min =  +4.50 V, max =  +5.50 V)
 +12 Voltage:      +12.14 V  (min = +10.20 V, max = +13.80 V)

Guessing that's obviously an acpi thing but how do I bring that up?
NeddySeagoon wrote:
panderiz,

When you are out of RAM, the kernel swaps - that does not always mean using your swap partition. Thats only for RAM that has no permanent home on disk. Dynamically allocated RAM.
When you are out of RAM and there is no scope for swapping, the kernel calls the Out Of Memmory (OOM) manager. The OOM kills tasks so that the kernel can continue.
When the OOM kicks in, you tend to notice. More importantly, it does not cause lockups.

I've don't ever see my RAM 100% it normally is around 80% when the crashes occur but that's where it normally sits either way so I really don't think that's the reason.
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Wed Sep 22, 2010 4:34 pm    Post subject: Reply with quote

panderiz wrote:
Guessing that's obviously an acpi thing but how do I bring that up?


Hmm, hmmm, as i said (not my fault both commands means also that in english) i get it by typing :
Code:
watch sensors
Back to top
View user's profile Send private message
dermund
Apprentice
Apprentice


Joined: 28 Aug 2007
Posts: 205
Location: Sprawl

PostPosted: Wed Sep 22, 2010 5:20 pm    Post subject: Reply with quote

Hi panderiz,

In case you don't have
Code:
sensors
you can
Code:
emerge -av lm_sensors
and follow the on-screen instructions of emerge, ... to get it up and running.
You might install a few hardware specific kernel modules - but that is asked through the installation routine.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54317
Location: 56N 3W

PostPosted: Wed Sep 22, 2010 7:10 pm    Post subject: Reply with quote

panderiz,

lm-sensors will only help if your voltages are out long enough to detect *and* your CPU does not lock up as the result of the problem you hope to detect.
Its almost certain that lm-sensors is too slow.

Consider your Vcore from a moment. To keep the arithmetic easy, we will assume that your CPU has a core clock of 1Ghz (faster makes it worse)
Further, your CPU can go from idle to full power in under 10 CPU clocks.

Your CPU clock ticks every nanosecond and you CPU can go from close to zero to around 100W in 10 nanoseconds (ns). When that step happens, the Vcore regulator relys on the stored energy in the capacitors to keep Vcore within limits. A voltage undershoot will cause CPU issues. Going the other way, from full power to idle causes a voltage overshoot, which can cause permanent CPU damage.

How fast does lm-sensors react?
Lets say is uses the 33MHz PCI clock and uses a nice low cost successive approximation analogue to digital converter, so at best you get 1 bit every clock. to get 10 bits takes 10 clocks, add at least 2 more clocks for a start conversion and read. (It will be more). A 33 MHz clock ticks once every 30ns and you need over 12 ticks, thats 360ns
There is no way you can see a 10ns event with a 360ns conversion time. Well, there is, you could have a really fast sample and hold, so that conversion time is not an issue. Thats expensive and the power supplies are supposed to be DC. PCs won't use sample an hold at all.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
panderiz
n00b
n00b


Joined: 10 Dec 2008
Posts: 50

PostPosted: Thu Sep 23, 2010 1:02 am    Post subject: Reply with quote

Neddy, I thought it might be useful to use guess not. I do have my clock constantly going from 1.35GHz to 2.7GHz because my CPU frequency scaler is set to on demand. So far since I started this topic it hasn't locked up but from what you just said it's likely that the lock ups are because of the CPU constantly changing it's clocking speed, just not sure why it's deciding to act up now... I'll just leave it at 1.35 and wait it out. I could also lower the VCore from my BIOS if I recall. Next time it locks up I'll drop my VCore down a bit from the BIOS (if it's possible)
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54317
Location: 56N 3W

PostPosted: Thu Sep 23, 2010 5:30 pm    Post subject: Reply with quote

panderiz,

Leave Vcore at its nominal value. Reducing it will probably make lockups worse.

If it is a Vcore related issue, forcing the use of a lower CPU core frequency may help, as the value of 'full power' is clock speed related.
It may not be tied to CPU core speed switching.

Its easy to spot problems by inspection with an unaided Mk1 eyeball. Post a few pictures of the area around your CPU and I'll look them over.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
panderiz
n00b
n00b


Joined: 10 Dec 2008
Posts: 50

PostPosted: Fri Sep 24, 2010 5:59 am    Post subject: Reply with quote

NeddySeagoon wrote:
panderiz,

Leave Vcore at its nominal value. Reducing it will probably make lockups worse.

If it is a Vcore related issue, forcing the use of a lower CPU core frequency may help, as the value of 'full power' is clock speed related.
It may not be tied to CPU core speed switching.

Its easy to spot problems by inspection with an unaided Mk1 eyeball. Post a few pictures of the area around your CPU and I'll look them over.

Pardon my ignorance but what is a MK1 eyeball? I'm guessing you mean a normal eye? I'll grab a picture when I get back from school heading off to bed in a few minutes.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54317
Location: 56N 3W

PostPosted: Fri Sep 24, 2010 8:20 pm    Post subject: Reply with quote

panderiz,

The normal unaided eye.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
panderiz
n00b
n00b


Joined: 10 Dec 2008
Posts: 50

PostPosted: Sun Oct 03, 2010 11:23 pm    Post subject: Reply with quote

Sorry for such a delay in my response time had something come up... Anyways heres a picture I grabbed from around my CPU I can retake another if you need to see elsewhere, can't do much about the quality was taken with my phone.
http://img7.imageshack.us/img7/3968/101002132700.jpg
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Mon Oct 04, 2010 12:55 am    Post subject: Reply with quote

a bit blurry for me :)
i will just note that you have one 5v connector empty (next/close to the hybrid card slot, bellow the big "black something" sign)

Did you check the m/b manual for its usage ?
Because leaving it without power can be a cause of why you lack power
(but also putting power while not having the hybrid slot fill could burn the m/b on poor design m/b if it's for that slot, just to say don't put power on it, read the manual for it's purpose/limitation first)
Back to top
View user's profile Send private message
panderiz
n00b
n00b


Joined: 10 Dec 2008
Posts: 50

PostPosted: Mon Oct 04, 2010 3:15 am    Post subject: Reply with quote

krinn wrote:
a bit blurry for me :)
i will just note that you have one 5v connector empty (next/close to the hybrid card slot, bellow the big "black something" sign)

Did you check the m/b manual for its usage ?
Because leaving it without power can be a cause of why you lack power
(but also putting power while not having the hybrid slot fill could burn the m/b on poor design m/b if it's for that slot, just to say don't put power on it, read the manual for it's purpose/limitation first)

I don't need both of the 5V slots there filled... Also my desktop has been running fairly stable(aside from random software breakages) for well over a year now so I know it isn't the lack of extra connection. Regarding the blurry photo sorry about that... Phone is pretty bad for taking good detail pictures.
Back to top
View user's profile Send private message
Chiitoo
Administrator
Administrator


Joined: 28 Feb 2010
Posts: 2581
Location: Here and Away Again

PostPosted: Mon Oct 04, 2010 10:53 pm    Post subject: Hmm... Reply with quote

Be greeted~


Is this MOBO by any chance the ECS GF8200A?

Model Version V1.0
Product Serial No V46100G92901001
BIOS Version 09/08/27

Environment Information

Code:
 
       CPU
  AMD Athlon 64 X2 Dual-Core 7750 BlackEdition
       Memory
  Kingston HyperX 2GB 1066MHz DDR2 Non-ECC CL5 kit2 nVidia-SL
       HDD
  WD CAVIAR GP 500GB SATA2 32MB INTELLIPOW
       Operation System
  WinXP Pro
       External VGA Card
  XFX GF8800GT
       Power Supply
  CHIEFTEC ATX 500W
       Other PCI card / device
  Creative X-Fi XtremeGamer

I had this a while ago, and pretty much from the beginning it would damn randomly freeze the OS completely, sometimes any sound being played back at the time would loop or the sound would remind me of a Commodore 64 game sound effect or some such... I had previously been using all the other hardware with another MOBO and never had any problems really, so after a lengthy discussion with the manufacturer, I sent the MOBO back to the retailer as they suggested, got a new one (I think it was new, either way identical board) and the problem would still appear whenever it wanted to, it seemed. I could never reproduce it when I tried, sometimes the system would run perfectly fine for many, MANY days and then, out of the blue, FREEZE! I hoped to at least have a deep blue screen to have something to work with but no, just a freeze with no trails whatsoever.

So I sent it back again and they could not reproduce it so they gave _some_ of my money back but never found out the reason to the lockupses.

As you can see from the specifications at the time, it was under windozexP, so perhaps you had the problem always but it just never hit in so bad. ^^


Anyhoo, I was just strolling by and saw your picture of the MOBO and it instantly brought this into my mind.
I hope this helps, in a way or another!
_________________
Kindest of regardses.
Back to top
View user's profile Send private message
panderiz
n00b
n00b


Joined: 10 Dec 2008
Posts: 50

PostPosted: Tue Oct 05, 2010 12:31 am    Post subject: Reply with quote

My motherboard is... ECS A790GXM-A
If my motherboard is causing the lock ups I'll keep the faulty motherboard rather than get _some_ of my money back. Thanks for your input as well. When I get a chance I'll see if I can get the local computer shop I go to, to check it out for free. I believe it might be the PSU but I'm also a hardware idiot :P
Back to top
View user's profile Send private message
Chiitoo
Administrator
Administrator


Joined: 28 Feb 2010
Posts: 2581
Location: Here and Away Again

PostPosted: Tue Oct 05, 2010 4:11 am    Post subject: Reply with quote

OK so it's not the exact same board but it does seem very similar, as does the problem.

And yeah, retailers here can be kinda unfair with stuff like this. If they cannot reproduce a problem encountered, they either charge you for their time they spent and send the junk back or, as in this case, be a bit more nice about it and at least give SOME kind of a refund.

This is in Finland.

My hardware does often change and usually it's the low/mid price range so for example the PSUs have proven to be guilty a LOT of freezy problems, mostly they make the computer restart/shutdown but it can be almost anything. I often start troubleshooting from there, the PSU, as you can find OK ones cheap and just toss it in, well almost like that. I don't recommend cheap PSUs ever, but sometimes I get one for like 20€ just to try if the problem is there/changes. It's so much easier than going for a new MOBO for example.

Actually, my current 800W PSU I have is from the mid'ish price range and it keeps making weird BZZZ noises. >.>
I hate these bastards in metal boxes, with a passion...
Previous one started to make an ungodly high-pitch noise which I even took apart just to see if I can find the fault part but it's so high and all over it's nearly impossible to tell which part makes it, not to mention you can't exactly put your ear against it due to the... yeah, the shocking factor involved.


I hate them!
_________________
Kindest of regardses.
Back to top
View user's profile Send private message
panderiz
n00b
n00b


Joined: 10 Dec 2008
Posts: 50

PostPosted: Tue Oct 05, 2010 10:57 pm    Post subject: Reply with quote

Well right now I can't even afford to buy a cheap temporary PSU to test. This is so random I can't put one in and say "Alright so after 5 days of no lock up I'm safe to say it's a PSU" The times it happens are random. Since I've started this thread it's only happened once or twice, and if I sent in the motherboard and the company tried to charge me they've more than likely receive a very vulgar and detailed letter explain my displeasure to them trying to charge me ;) But I'll save up to buy a power supply to see if that changes anything.
Back to top
View user's profile Send private message
Chiitoo
Administrator
Administrator


Joined: 28 Feb 2010
Posts: 2581
Location: Here and Away Again

PostPosted: Wed Oct 06, 2010 6:04 pm    Post subject: Reply with quote

Yes that sounds very, VERY familiar.
The more random they are, the more annoying it gets for sure...

Well, I wish you good luck in finding the cause soon!
_________________
Kindest of regardses.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum