Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Is MCE log still in use?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
midnite
Guru
Guru


Joined: 09 Apr 2006
Posts: 451
Location: Hong Kong

PostPosted: Wed Jun 29, 2022 10:21 am    Post subject: Is MCE log still in use? Reply with quote

In Handbook, it says:

https://wiki.gentoo.org/wiki/Handbook:AMD64/Full/Installation#Manual_configuration
Quote:
Next select the exact processor type. It is also recommended to enable MCE features (if available) so that users are able to be notified of any hardware problems. On some architectures (such as x86_64), these errors are not printed to dmesg, but to /dev/mcelog. This requires the app-admin/mcelog package.


However, in the app-admin/mcelog package, it says:
Quote:
Starting with version 2.6.4, the Linux kernel for x86-64 no longer decodes and logs recoverable Machine Check Exception events to the kernel log on its own.


We are using 5.x.x kernels in 2022. Does it mean that we will never get error messages on MCE?

- - - -

On the other hand, in the kernel, the description of X86_MCELOG_LEGACY says:

Quote:
CONFIG_X86_MCELOG_LEGACY:

│ Enable support for /dev/mcelog which is needed by the old mcelog
│ userspace logging daemon. Consider switching to the new generation
│ rasdaemon solution.


If I did not enable X86_MCELOG_LEGACY, I do not have the device file /dev/mcelog. So I think X86_MCELOG_LEGACY is needed for the kernel to write something into /dev/mcelog, then parsed by app-admin/mcelog.

- - - -

For your information, I am able to test mcelog by following the instruction in http://mcelog.org/README.html . The mce-inject is available in https://github.com/andikleen/mce-inject . Be caution that this test will halt your system, halt for a few seconds then auto reboot. You will loss all unsaved files.

- - - -

To summarise,

  1. Will we still get MCE logs in case of hardware problems?
  2. If no, MCE logs are no longer in use (hinted by app-admin/mcelog and kernel's X86_MCELOG_LEGACY). How to configure "rasdaemon"? (Handbook may need updating if so.)

_________________
- midnite.
Back to top
View user's profile Send private message
slaterson
Guru
Guru


Joined: 26 Feb 2003
Posts: 313

PostPosted: Fri Nov 04, 2022 3:59 pm    Post subject: Reply with quote

i have the same question.... i get mce messages to my console and in syslog, however mcelog fails to start with
Code:
mcelog: Cannot open `/dev/mcelog': No such file or directory


can't figure out how to get the device to be created. MCE, MCELOG, and MCE_INTEL are all enabled in my kernel.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23075

PostPosted: Fri Nov 04, 2022 4:47 pm    Post subject: Reply with quote

git grep mcelog -- '*Kconfig' leads to arch/x86/Kconfig, which has:
Code:
config X86_MCELOG_LEGACY
   bool "Support for deprecated /dev/mcelog character device"
   depends on X86_MCE
   help
     Enable support for /dev/mcelog which is needed by the old mcelog
     userspace logging daemon. Consider switching to the new generation
     rasdaemon solution.
You could enable this deprecated feature, or switch to the recommended new approach.
Back to top
View user's profile Send private message
toralf
Developer
Developer


Joined: 01 Feb 2004
Posts: 3942
Location: Hamburg

PostPosted: Fri Nov 04, 2022 11:15 pm    Post subject: Reply with quote

I tried rasdaemon a yr ago - did not got it up and running at an i5.
And it crashes constantly here at an amd 5950.
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Sat Nov 05, 2022 9:21 pm    Post subject: Reply with quote

I'm afraid I haven't understood what mcelog does and how it works.

I remember that I once had a server with a faulty RAM. There was an error message in /var/log/mcelog. That made me think that the mcelog daemon is something like syslog for hardware errors. I thought that the kernel generates an error message and the mcelog daemon writes it to /var/log/mcelog.

But I guess that's wrong. Right now I think that the mcelog daemon (or rasdaemon) detects an error and writes a message to a log file. Furthermore, it informs the kernel about the issue.

Does anyone know a good introduction on mcelog and rasdaemon? I would like to know how this really works and on what kind of machines it's recommended to install mcelog or rasdaemon.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum