Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
what is this log entry "Machine check events logged"
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
snIP3r
l33t
l33t


Joined: 21 May 2004
Posts: 853
Location: germany

PostPosted: Sat Jan 05, 2008 12:37 pm    Post subject: what is this log entry "Machine check events logged&quo Reply with quote

hi all!

recentrly i checked my /var/log/messages an i found these entries:

Code:

Jan  1 17:48:58 area52 Machine check events logged
Jan  1 17:55:13 area52 Machine check events logged
Jan  1 18:01:19 area52 Machine check events logged
Jan  1 18:15:41 area52 Machine check events logged
Jan  1 18:21:56 area52 Machine check events logged
Jan  1 18:29:26 area52 Machine check events logged
Jan  1 18:38:10 area52 Machine check events logged
Jan  1 18:45:40 area52 Machine check events logged
Jan  1 18:53:10 area52 Machine check events logged
Jan  1 18:58:47 area52 Machine check events logged
Jan  1 19:05:02 area52 Machine check events logged


there are no more log-entries like these - not before nor after. i dont know what they mean and who produced them. i also checked my emerge log - i havent installed any package at this timestamp or before that could produce these messages.

can someone describe what this means? do i have to worry?

thx in advance
snIP3r
_________________
Intel i3-4130T on ASUS P9D-X
Kernel 5.15.88-gentoo SMP
-----------------------------------------------
if your problem is fixed please add something like [solved] to the topic!
Back to top
View user's profile Send private message
bunder
Bodhisattva
Bodhisattva


Joined: 10 Apr 2004
Posts: 5947

PostPosted: Sat Jan 05, 2008 1:07 pm    Post subject: Reply with quote

those are from the machine check exception option from the kernel.

http://en.wikipedia.org/wiki/Machine_Check_Exception

Quote:
MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0.
Bank 1: 9400000000000151
MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0.
Bank 2: 940040000000017a


not sure why it doesn't give you more info compared to the one i pasted above though. i'd check for overheating or ram errors.

cheers
_________________
Neddyseagoon wrote:
The problem with leaving is that you can only do it once and it reduces your influence.

banned from #gentoo since sept 2017
Back to top
View user's profile Send private message
snIP3r
l33t
l33t


Joined: 21 May 2004
Posts: 853
Location: germany

PostPosted: Sat Jan 05, 2008 1:48 pm    Post subject: Reply with quote

is this cause i do not have installed mcelog-package?
_________________
Intel i3-4130T on ASUS P9D-X
Kernel 5.15.88-gentoo SMP
-----------------------------------------------
if your problem is fixed please add something like [solved] to the topic!
Back to top
View user's profile Send private message
snIP3r
l33t
l33t


Joined: 21 May 2004
Posts: 853
Location: germany

PostPosted: Sat Jan 05, 2008 2:07 pm    Post subject: Reply with quote

ok, i installed mcelog and found numerous errors like this:

Code:

MCE 31
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 0 data cache TSC 1c39af3c77c74b
ADDR 4e74af80
  Data cache ECC error (syndrome 97)
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  memory/cache error 'data read mem transaction, data transaction, level 2'
STATUS d44bc00000000136 MCGSTATUS 0

does this mean that my cpu is broken? or my ram? i encountered no problems with the cpu or ram. everything runs fine so far. i have a _very_ stable system, no lockups so far. its a pitty that no timestamp is printed out so i cannot determine the exact occurence of the errors.

perhaps someone can give me some advise!

EDIT: there is something strange happening: now these messages accumulate. the last 45 mins i got 7 messages!?!?!?

EDIT2: i did a little research on my gentoo box. found out that the load increased constantly while no one was doing anything on it. then i saw a bash process with 100% cpu usage. temp of cpu also increased. after killing this bash process, everything went to normal - and also these messages disappeared. for 1 hour i got no more "Machine check events logged" messages ;)

but nevertheless i still want to know if these messages mean an hardware error?

greets
snIP3r
_________________
Intel i3-4130T on ASUS P9D-X
Kernel 5.15.88-gentoo SMP
-----------------------------------------------
if your problem is fixed please add something like [solved] to the topic!
Back to top
View user's profile Send private message
bunder
Bodhisattva
Bodhisattva


Joined: 10 Apr 2004
Posts: 5947

PostPosted: Sat Jan 05, 2008 6:54 pm    Post subject: Reply with quote

Quote:
but nevertheless i still want to know if these messages mean an hardware error?


yep. if you read the wiki link i pasted, it gives many causes of these errors.

cheers
_________________
Neddyseagoon wrote:
The problem with leaving is that you can only do it once and it reduces your influence.

banned from #gentoo since sept 2017
Back to top
View user's profile Send private message
snIP3r
l33t
l33t


Joined: 21 May 2004
Posts: 853
Location: germany

PostPosted: Tue Jan 08, 2008 4:45 pm    Post subject: Reply with quote

hi again!

i tried to reproduce the error messages by comiling the kernel with makeopts="-j3" and parallel i unrared a big file on an encrypted filesystem (overall i got a load of 3 and core0 temp 30° and core1 temp 45° - no error messages are displayed. i also checked the logs again and encountered that the messages appear form 03.00 to 06.30 in the morning. at this time my cron.daily scripts are executed:

Code:

area52 cron.daily # ls -la
total 32
drwxr-x---  2 root root  115 Jan  8 16:29 .
drwxr-xr-x 67 root root 8192 Jan  8 17:24 ..
-rw-r--r--  1 root root    0 Apr 17  2007 .keep
-rw-r--r--  1 root root    0 Sep 28 21:15 .keep_sys-process_cronbase-0
-r-xr-xr-x  1 root root 7386 Jul  5  2007 dccd
-rwxr-xr-x  1 root root   52 Jul  5  2007 logrotate.cron
-rwxr-xr-x  1 root root  115 Jul  5  2007 makewhatis
-rwxr-xr-x  1 root root  121 Jan  8 16:29 mcelog

but checking the munin and hotsanic data i cannot see any stressing of the cpu (no load, no higher cpu usage or temperature), so i do not know what script triggers the messages... ok, i ran every scriopt manually and none of these produces the messages... start searching from the beginning...
and again: the machine runs as normal, everything seems to be ok.

greets
snIP3r
_________________
Intel i3-4130T on ASUS P9D-X
Kernel 5.15.88-gentoo SMP
-----------------------------------------------
if your problem is fixed please add something like [solved] to the topic!
Back to top
View user's profile Send private message
snIP3r
l33t
l33t


Joined: 21 May 2004
Posts: 853
Location: germany

PostPosted: Sat Jan 12, 2008 9:47 pm    Post subject: Reply with quote

hi all!

i still hope that someone could help me with this. i encountered that this issue has something to do with the cron jobs running on my system. today i changed the hour of the running cron.daily jobs. and ~10 minutes after the cron.daily jobs are started, the messages were displayed in my log.
but still, executing the scripts manually does not cause the messages. so i don't know what else might be a reason for these messages.

can anyone help?

btw: calling amd technical support does also not bring any solution :(

greets
snIP3r
_________________
Intel i3-4130T on ASUS P9D-X
Kernel 5.15.88-gentoo SMP
-----------------------------------------------
if your problem is fixed please add something like [solved] to the topic!
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum