View previous topic :: View next topic |
Author |
Message |
snIP3r l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
Joined: 21 May 2004 Posts: 853 Location: germany
|
Posted: Sat Jan 05, 2008 12:37 pm Post subject: what is this log entry "Machine check events logged&quo |
|
|
hi all!
recentrly i checked my /var/log/messages an i found these entries:
Code: |
Jan 1 17:48:58 area52 Machine check events logged
Jan 1 17:55:13 area52 Machine check events logged
Jan 1 18:01:19 area52 Machine check events logged
Jan 1 18:15:41 area52 Machine check events logged
Jan 1 18:21:56 area52 Machine check events logged
Jan 1 18:29:26 area52 Machine check events logged
Jan 1 18:38:10 area52 Machine check events logged
Jan 1 18:45:40 area52 Machine check events logged
Jan 1 18:53:10 area52 Machine check events logged
Jan 1 18:58:47 area52 Machine check events logged
Jan 1 19:05:02 area52 Machine check events logged
|
there are no more log-entries like these - not before nor after. i dont know what they mean and who produced them. i also checked my emerge log - i havent installed any package at this timestamp or before that could produce these messages.
can someone describe what this means? do i have to worry?
thx in advance
snIP3r _________________ Intel i3-4130T on ASUS P9D-X
Kernel 5.15.88-gentoo SMP
-----------------------------------------------
if your problem is fixed please add something like [solved] to the topic! |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
bunder Bodhisattva
![Bodhisattva Bodhisattva](/images/ranks/rank-bodhisattva.gif)
Joined: 10 Apr 2004 Posts: 5947
|
Posted: Sat Jan 05, 2008 1:07 pm Post subject: |
|
|
those are from the machine check exception option from the kernel.
http://en.wikipedia.org/wiki/Machine_Check_Exception
Quote: | MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0.
Bank 1: 9400000000000151
MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0.
Bank 2: 940040000000017a
|
not sure why it doesn't give you more info compared to the one i pasted above though. i'd check for overheating or ram errors.
cheers _________________
Neddyseagoon wrote: | The problem with leaving is that you can only do it once and it reduces your influence. |
banned from #gentoo since sept 2017 |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
snIP3r l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
Joined: 21 May 2004 Posts: 853 Location: germany
|
Posted: Sat Jan 05, 2008 1:48 pm Post subject: |
|
|
is this cause i do not have installed mcelog-package? _________________ Intel i3-4130T on ASUS P9D-X
Kernel 5.15.88-gentoo SMP
-----------------------------------------------
if your problem is fixed please add something like [solved] to the topic! |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
snIP3r l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
Joined: 21 May 2004 Posts: 853 Location: germany
|
Posted: Sat Jan 05, 2008 2:07 pm Post subject: |
|
|
ok, i installed mcelog and found numerous errors like this:
Code: |
MCE 31
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 0 data cache TSC 1c39af3c77c74b
ADDR 4e74af80
Data cache ECC error (syndrome 97)
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
memory/cache error 'data read mem transaction, data transaction, level 2'
STATUS d44bc00000000136 MCGSTATUS 0
|
does this mean that my cpu is broken? or my ram? i encountered no problems with the cpu or ram. everything runs fine so far. i have a _very_ stable system, no lockups so far. its a pitty that no timestamp is printed out so i cannot determine the exact occurence of the errors.
perhaps someone can give me some advise!
EDIT: there is something strange happening: now these messages accumulate. the last 45 mins i got 7 messages!?!?!?
EDIT2: i did a little research on my gentoo box. found out that the load increased constantly while no one was doing anything on it. then i saw a bash process with 100% cpu usage. temp of cpu also increased. after killing this bash process, everything went to normal - and also these messages disappeared. for 1 hour i got no more "Machine check events logged" messages
but nevertheless i still want to know if these messages mean an hardware error?
greets
snIP3r _________________ Intel i3-4130T on ASUS P9D-X
Kernel 5.15.88-gentoo SMP
-----------------------------------------------
if your problem is fixed please add something like [solved] to the topic! |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
bunder Bodhisattva
![Bodhisattva Bodhisattva](/images/ranks/rank-bodhisattva.gif)
Joined: 10 Apr 2004 Posts: 5947
|
Posted: Sat Jan 05, 2008 6:54 pm Post subject: |
|
|
Quote: | but nevertheless i still want to know if these messages mean an hardware error? |
yep. if you read the wiki link i pasted, it gives many causes of these errors.
cheers _________________
Neddyseagoon wrote: | The problem with leaving is that you can only do it once and it reduces your influence. |
banned from #gentoo since sept 2017 |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
snIP3r l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
Joined: 21 May 2004 Posts: 853 Location: germany
|
Posted: Tue Jan 08, 2008 4:45 pm Post subject: |
|
|
hi again!
i tried to reproduce the error messages by comiling the kernel with makeopts="-j3" and parallel i unrared a big file on an encrypted filesystem (overall i got a load of 3 and core0 temp 30° and core1 temp 45° - no error messages are displayed. i also checked the logs again and encountered that the messages appear form 03.00 to 06.30 in the morning. at this time my cron.daily scripts are executed:
Code: |
area52 cron.daily # ls -la
total 32
drwxr-x--- 2 root root 115 Jan 8 16:29 .
drwxr-xr-x 67 root root 8192 Jan 8 17:24 ..
-rw-r--r-- 1 root root 0 Apr 17 2007 .keep
-rw-r--r-- 1 root root 0 Sep 28 21:15 .keep_sys-process_cronbase-0
-r-xr-xr-x 1 root root 7386 Jul 5 2007 dccd
-rwxr-xr-x 1 root root 52 Jul 5 2007 logrotate.cron
-rwxr-xr-x 1 root root 115 Jul 5 2007 makewhatis
-rwxr-xr-x 1 root root 121 Jan 8 16:29 mcelog
|
but checking the munin and hotsanic data i cannot see any stressing of the cpu (no load, no higher cpu usage or temperature), so i do not know what script triggers the messages... ok, i ran every scriopt manually and none of these produces the messages... start searching from the beginning...
and again: the machine runs as normal, everything seems to be ok.
greets
snIP3r _________________ Intel i3-4130T on ASUS P9D-X
Kernel 5.15.88-gentoo SMP
-----------------------------------------------
if your problem is fixed please add something like [solved] to the topic! |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
snIP3r l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
Joined: 21 May 2004 Posts: 853 Location: germany
|
Posted: Sat Jan 12, 2008 9:47 pm Post subject: |
|
|
hi all!
i still hope that someone could help me with this. i encountered that this issue has something to do with the cron jobs running on my system. today i changed the hour of the running cron.daily jobs. and ~10 minutes after the cron.daily jobs are started, the messages were displayed in my log.
but still, executing the scripts manually does not cause the messages. so i don't know what else might be a reason for these messages.
can anyone help?
btw: calling amd technical support does also not bring any solution
greets
snIP3r _________________ Intel i3-4130T on ASUS P9D-X
Kernel 5.15.88-gentoo SMP
-----------------------------------------------
if your problem is fixed please add something like [solved] to the topic! |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
|