View previous topic :: View next topic |
Author |
Message |
ExecutorElassus Veteran
![Veteran Veteran](/images/ranks/rank_rect_5_vet.gif)
![](images/avatars/15750694344835dc870e988.jpg)
Joined: 11 Mar 2004 Posts: 1471 Location: Berlin, Germany
|
Posted: Sat Apr 13, 2019 1:19 pm Post subject: |
|
|
Well, I've been running now for a week or two with a new PSU, and it hasn't frozen yet. So I don't have any further information, except for this: I saw in dmesg the following error messages:
Code: | [ 8410.034472] mce: [Hardware Error]: Machine check events logged
[ 8410.034478] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 2: 98254000000c0176
[ 8410.034482] mce: [Hardware Error]: TSC 0 MISC c008000100000000
[ 8410.034488] mce: [Hardware Error]: PROCESSOR 2:600f20 TIME 1555132314 SOCKET 0 APIC 0 microcode 6000822
[15258.199168] mce: [Hardware Error]: Machine check events logged
[15258.199173] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 2: dc25407000040136
[15258.199176] mce: [Hardware Error]: TSC 0 ADDR 7b039ad38 MISC c008000300000000
[15258.199178] mce: [Hardware Error]: PROCESSOR 2:600f20 TIME 1555139162 SOCKET 0 APIC 1 microcode 6000822
[15569.479162] mce: [Hardware Error]: Machine check events logged
[15569.479164] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 2: dc2540e000040136
[15569.479168] mce: [Hardware Error]: TSC 0 ADDR 705799f78 MISC c008000700000000
[15569.479171] mce: [Hardware Error]: PROCESSOR 2:600f20 TIME 1555139474 SOCKET 0 APIC 1 microcode 6000822
|
Any idea what that is? The CPU is an AMD FX-9590, from 2016.
Cheers,
EE |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
NeddySeagoon Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
![](images/avatars/3946266373f47d606a2db3.jpg)
Joined: 05 Jul 2003 Posts: 54834 Location: 56N 3W
|
Posted: Sat Apr 13, 2019 4:07 pm Post subject: |
|
|
ExecutorElassus,
At face value its a CPU problem.
However, if you have ECC RAM, it can be a RAM problem too.
The CPU has ECC on the internal caches and without ECC errors go undetected.
The good news is that the error was detected and corrected.
Detected uncorrectable errors get you a panic. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
JustAnother Apprentice
![Apprentice Apprentice](/images/ranks/rank_rect_2.gif)
Joined: 23 Sep 2016 Posts: 197
|
Posted: Sun Apr 14, 2019 5:12 am Post subject: |
|
|
I had my own little freezing issue lately.
-- A dual core AMD machine, ~2008.
-- The "cpu fan bracket" has two hooks that live under a lot of stress.
-- About three years ago I heard a loud sound like a bolt being thrown against the case. Then the machine started shutting down. I realized that a fan bracket hook had broken with great gusto, and the cpu was overheating. For a few bucks I got it running. Case closed - or so I thought.
-- Recently the same machine started freezing without warning. No messages. No pings. No ssh's. No nothing. Just stone silence.
-- Aha! It must be firefox - it had just upgraded. Turned out not the be the problem.
-- Aha! I had just dockerized the kernel. I must have messed up a working kernel with all those fancy schmancy docker switches. Turned out not to be the case.
-- Aha! Electromigration -- people say the processor is only good for about 10 years, and those damn atoms have jostled around one too many times. Nope.
-- Aha! A cosmic ray nailed the cpu. Nope.
-- Aha! It must be the memory going senile because it dune wore out. I had never seen anything special happen when I ran memtest, but "they" said to do this. To my surprise, the computer seemed to freeze during memtest. After several more runs memtest failed, saying there was a bogus hardware interrupt on cpu 1, shutting it down. Okey, so it's the hardware, not the dockerized kernel.
-- It's only 11 years old, so maybe it's time to upgrade. But I can't stand the thought of rebuilding this thing from scratch.
-- So I figured it might be wise to open the case and look for anything obvious, like some sparks or some ugly black stains.
-- Aha! I found something. The heat sink and fan assembly didn't feel right -- it was too loose. Apparently one of the two fan bracket plastic hooks had failed, but unlike the previous failure where the hook went flying like a bullet, two sides failed and it pivoted along the third side. -- The net effect was that one side of the cpu was held too loosely, and the other side was held way too loosely. This puts a gradient onto the thermal conductance per unit area, leading to an asymmetric cpu failure. This sneaky little problem was making the computer freeze, not shut down.
-- A new part for $5 seemed to fix everything. I just smeared around that nasty grease with my finger. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|