View previous topic :: View next topic |
Author |
Message |
hanj Veteran
![Veteran Veteran](/images/ranks/rank_rect_5_vet.gif)
![](images/avatars/104414163441b8c6c1a6b4e.gif)
Joined: 19 Aug 2003 Posts: 1500
|
Posted: Wed Aug 16, 2006 3:15 pm Post subject: kernel panic on emerge [URGENT] |
|
|
Hello All
The last few days I've been experiencing this problem during emerges. When I emerge a package (again this is intermittent) and around md5 check and unpack.. it freezes. Viewing the console.. I get this message
Code: | ...
[<c016600f>]
[<c0127a29>]
Code: Bad EIP value.
<0>Kernel panic - not syncing: Fatal exception in interrupt |
At first I thought it was a recent change in the kernel I made. I added SMP support in there. Yesterday, I built a new kernel and removed that feature set. Today it happened again. I just rolled back to an older kernel this morning.. and have not seen any problems as of yet.
My quesion.. could this be kernel vesion, config problem.. or am I looking at a hardware problem?
The kernel config has been the same for about 2 years with the exception of SMP which I just added and removed. I feel like the config should be 'good'.
Here is my relevent info.
The kernel version that I was running was..
Code: | 2.6.16-hardened-r11 |
I just rolled back to
Code: | 2.6.16-hardened-r10 |
Code: | emerge --info
Portage 2.1-r2 (default-linux/x86/2006.0, gcc-3.4.6, glibc-2.3.6-r4, 2.6.16-hardened-r10 i686)
=================================================================
System uname: 2.6.16-hardened-r10 i686 Intel(R) Pentium(R) 4 CPU 2.80GHz
Gentoo Base System version 1.12.4
app-admin/eselect-compiler: [Not Present]
dev-lang/python: 2.3.5-r2, 2.4.3-r1
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache: [Not Present]
dev-util/confcache: [Not Present]
sys-apps/sandbox: 1.2.17
sys-devel/autoconf: 2.13, 2.59-r7
sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2
sys-devel/binutils: 2.16.1-r3
sys-devel/gcc-config: 1.3.13-r3
sys-devel/libtool: 1.4.3-r4, 1.5.22
virtual/os-headers: 2.6.11-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=pentium4 -funroll-loops -fprefetch-loop-arrays -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /var/bind"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo"
CXXFLAGS="-O2 -march=pentium4 -funroll-loops -fprefetch-loop-arrays -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="x86 alsa apache2 berkdb bitmap-fonts cli crypt dlloader dri eds emboss esd foomaticdb fortran gdbm gif gstreamer imlib isdnlog jpeg libg++ libwww mp3 ncurses nptl ogg pam pcre pdflib perl php png pppd python qt3 qt4 readline reflection sasl session snortsam spell spl ssl tcpd truetype-fonts type1-fonts udev vorbis xml xorg zlib elibc_glibc input_devices_keyboard input_devices_mouse input_devices_evdev kernel_linux userland_GNU"
Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_RSYNC_EXTRA_OPTS |
Code: | cat /proc/meminfo
MemTotal: 767020 kB
MemFree: 672240 kB
Buffers: 9320 kB
Cached: 45536 kB
SwapCached: 0 kB
Active: 50824 kB
Inactive: 28756 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 767020 kB
LowFree: 672240 kB
SwapTotal: 979956 kB
SwapFree: 979956 kB
Dirty: 0 kB
Writeback: 0 kB
Mapped: 39844 kB
Slab: 10444 kB
CommitLimit: 1363464 kB
Committed_AS: 210700 kB
PageTables: 936 kB
VmallocTotal: 253944 kB
VmallocUsed: 2100 kB
VmallocChunk: 251844 kB |
Code: |
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping : 1
cpu MHz : 2793.507
cache size : 1024 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 3
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc pni monitor ds_cpl cid xtpr
bogomips : 5597.60 |
[update].. just happened again with 2.6.16-hardened-r10!!! I know I've been running with this kernel for awhile in the past without any problems.
This is the as far as I get on emerge...
Code: | emerge -v baselayout
Calculating dependencies... done!
>>> Emerging (1 of 1) sys-apps/baselayout-1.12.4-r6 to /
>>> checking ebuild checksums ;-)
>>> checking auxfile checksums ;-)
>>> checking miscfile checksums ;-)
>>> checking baselayout-1.12.4.tar.bz2 ;-) |
Let me know if I can supply any additional information
Thanks!
hanji |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
NeddySeagoon Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
![](images/avatars/3946266373f47d606a2db3.jpg)
Joined: 05 Jul 2003 Posts: 54824 Location: 56N 3W
|
Posted: Wed Aug 16, 2006 3:26 pm Post subject: |
|
|
hanj,
Intermittent problems like this are normally indicative of a hardware problem.
The unpack is very CPU and RAM intensive, so the CPU temerature rises.
Check your cooling - use a stiff brush to clean your fan/heatsink assembly.
Run lm-sensors to keep and eye on temperatues (there are other ways)
Run memtest86 from the liveCD.
If that doesn't help, its time to begin removing things.
Operate with one stick of RAM, test each one individually this way. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
hanj Veteran
![Veteran Veteran](/images/ranks/rank_rect_5_vet.gif)
![](images/avatars/104414163441b8c6c1a6b4e.gif)
Joined: 19 Aug 2003 Posts: 1500
|
Posted: Wed Aug 16, 2006 3:34 pm Post subject: |
|
|
So no easy way to narrow it down to CPU or RAM problem? Could this be drive as well? This is one of my development servers, so I really can't have it down for hours doing memtest, etc... but if I have to.. I have to.
I'll try to get lm_sensors on there.. I don't think the board supports it very well. It's a Dell PowerEdge SC420.
Any tips to help narrow down the problem. I looked at the fan.. it looks good and clean.
Thanks for the reply
hanji |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
hanj Veteran
![Veteran Veteran](/images/ranks/rank_rect_5_vet.gif)
![](images/avatars/104414163441b8c6c1a6b4e.gif)
Joined: 19 Aug 2003 Posts: 1500
|
Posted: Wed Aug 16, 2006 3:48 pm Post subject: |
|
|
NeddySeagoon wrote: | Intermittent problems like this are normally indicative of a hardware problem.
The unpack is very CPU and RAM intensive, so the CPU temerature rises. |
It's odd that I can compile a kernel though.. seems like that should be torkin' on the CPU/RAM harder than an unpack. I would assume that it's pushing RAM.. right? Any quick way to push on RAM?
Thanks again.. I'm just thinking out loud.
hanji |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
NeddySeagoon Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
![](images/avatars/3946266373f47d606a2db3.jpg)
Joined: 05 Jul 2003 Posts: 54824 Location: 56N 3W
|
Posted: Wed Aug 16, 2006 4:18 pm Post subject: |
|
|
hanj,
Since you can compile a kernel, its probably not the CPU. Kernel compilies do not push RAM, its lots of small files, unlike unpack but that will use RAM for data, rather than program code and your crash comes from program code - executing data normally leads to different errors.
I doubt its a drive, or a data cable. reading corrupt data from the disk surface would be detected by the drives CRC checks.
Getting program code into RAM that was written incorrectly would be repeatable - it would be wrong every time.
You can run memtest86 without using the liveCD but its not very useful. It can get swapped out and moved around in physical RAM, also, it can't test all of RAM because some things can't be swapped/moved.
Reduce the box to one stick of memory - see if it still happens. Try each stick in turn, on its own.
Pulling a memory stick and seeing the problem go away does not imply the stick you pulled is faulty.
Disturbing the contacts between the RAM and montherboard socket can fix it too. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
hanj Veteran
![Veteran Veteran](/images/ranks/rank_rect_5_vet.gif)
![](images/avatars/104414163441b8c6c1a6b4e.gif)
Joined: 19 Aug 2003 Posts: 1500
|
Posted: Wed Aug 16, 2006 4:31 pm Post subject: |
|
|
NeddySeagoon wrote: | You can run memtest86 without using the liveCD but its not very useful. It can get swapped out and moved around in physical RAM, also, it can't test all of RAM because some things can't be swapped/moved.
Reduce the box to one stick of memory - see if it still happens. Try each stick in turn, on its own.
Pulling a memory stick and seeing the problem go away does not imply the stick you pulled is faulty.
Disturbing the contacts between the RAM and montherboard socket can fix it too. |
Thanks again. Seems like we're pointing to RAM initially then. I think I'll try some tar'ing/untar'ing to see if I can reproduce this. Then jump to removing RAM.. if still nothing, I'll crank on memtest tonight with all the sticks in there.
You mentioned CRC checks.. is there any way I can verify any CRC errors. hdparm?? I'm not too familiar with the process of testing if the drive is okay.
Thanks for the help..it's much appreciated.
hanji |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
NeddySeagoon Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
![](images/avatars/3946266373f47d606a2db3.jpg)
Joined: 05 Jul 2003 Posts: 54824 Location: 56N 3W
|
Posted: Wed Aug 16, 2006 6:18 pm Post subject: |
|
|
hanj,
The CRC checks are all internel to the drive - its more complex than just CRCs because the drive is able to recover so many bad bits in a sector too.
If you have an IDE drive, the easiest way to see if its ok is to ask it with smartmontools.
SATA drives support SMART too but libsata in the kernel doesn't yet. There is a patch but I don't think its in the vanillia kernel yet.
All modern drives hide bad sectors from the OS by 'on the fly' remapping to spares, so you should never see a bad sector until the drive is end of life. You can dd the entire drive to /dev/null to read the surface. It will stop on errors.
I think its either memory, PSU (as in the PSU box) or the Vcore PSU on the motherboard. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
troymc Guru
![Guru Guru](/images/ranks/rank_rect_3.gif)
Joined: 22 Mar 2006 Posts: 553
|
Posted: Wed Aug 16, 2006 9:36 pm Post subject: |
|
|
You can try enabling Machine Check Exception support for you processor in your kernel. The hardware tracks it's own errors in a special log. This will catch cpu, memory and other mainboard-related errors.
Once you have support enabled in the kernel, you will see messages in /var/log/messages stating something like "a machine check exception was logged" if your hardware detects an error. You can install app-admin/mcelog to view these errors.
troymc |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
NeddySeagoon Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
![](images/avatars/3946266373f47d606a2db3.jpg)
Joined: 05 Jul 2003 Posts: 54824 Location: 56N 3W
|
Posted: Wed Aug 16, 2006 10:38 pm Post subject: |
|
|
troymc,
Thats worth a try, however many hardware errors prevent the log being written, so no errors in the log does not mean the hardware is ok. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
hanj Veteran
![Veteran Veteran](/images/ranks/rank_rect_5_vet.gif)
![](images/avatars/104414163441b8c6c1a6b4e.gif)
Joined: 19 Aug 2003 Posts: 1500
|
Posted: Fri Aug 18, 2006 2:39 pm Post subject: |
|
|
Hello
I received a different error today.. just wanted to post it in case it provides an additional clue...
Code: | Unable to handle kernel NULL pointer dereference at virtual address 00000078
printing eip:
c0127f54
*pdg = 0
*pmd = 0
Recursive die() failure, output supressed
<0>Kernel panic - not synching: Fatal exception in interrupt |
This time it happened at the configure stage when it was checking for compile options. I blew out the computer real good the other day and things weren't giving me a problem until this morning. I'll pop out a stick RAM today, and start that test. I want to do one thing at a time.
Quote: | You can try enabling Machine Check Exception support for you processor in your kernel. The hardware tracks it's own errors in a special log. This will catch cpu, memory and other mainboard-related errors. |
I'll also do this too.
Thanks everyone.
hanji |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
hanj Veteran
![Veteran Veteran](/images/ranks/rank_rect_5_vet.gif)
![](images/avatars/104414163441b8c6c1a6b4e.gif)
Joined: 19 Aug 2003 Posts: 1500
|
Posted: Fri Aug 18, 2006 2:46 pm Post subject: |
|
|
troymc wrote: | You can try enabling Machine Check Exception support for you processor in your kernel. The hardware tracks it's own errors in a special log. This will catch cpu, memory and other mainboard-related errors.
Once you have support enabled in the kernel, you will see messages in /var/log/messages stating something like "a machine check exception was logged" if your hardware detects an error. You can install app-admin/mcelog to view these errors.
troymc |
hmm. I just checked my kernel config, and I already had that built in. I grep'd the logs.. and I just see it 'enabled' on the CPU, but no errors.
Code: | Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0 |
Bummer.
hanji |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
hanj Veteran
![Veteran Veteran](/images/ranks/rank_rect_5_vet.gif)
![](images/avatars/104414163441b8c6c1a6b4e.gif)
Joined: 19 Aug 2003 Posts: 1500
|
Posted: Mon Aug 21, 2006 3:18 am Post subject: |
|
|
Hello
36 hours and 114 pass w/no errors, I had to shut off memtest. Does this mean that memory is okay.. or do I have to completely finish the test? I can't believe it's taking this long?
I'm going to reseating RAM next
Thanks!
hanji |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
drescherjm Advocate
![Advocate Advocate](/images/ranks/rank-G-1-advocate.gif)
Joined: 05 Jun 2004 Posts: 2790 Location: Pittsburgh, PA, USA
|
Posted: Mon Aug 21, 2006 3:22 am Post subject: |
|
|
Quote: | or do I have to completely finish the test? |
It is a continuous test so it ends when you feel that you have waited long enough... _________________ John
My gentoo overlay
Instructons for overlay |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
mope Apprentice
![Apprentice Apprentice](/images/ranks/rank_rect_2.gif)
Joined: 23 Feb 2003 Posts: 206
|
Posted: Wed Nov 01, 2006 5:53 pm Post subject: |
|
|
Did you ever pinppoint the problem?
I'm getting the same thing on my presario 1710nx.
I'm on 2.6.18-r1 kernel.
I'll check memtest this morning and report back this afternoon, but maybe it's due to heat and the stock heatsink/fan? |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
lynnlinux n00b
![n00b n00b](/images/ranks/rank_rect_0.gif)
Joined: 03 Mar 2007 Posts: 39 Location: Shanghai
|
Posted: Wed May 02, 2007 11:48 pm Post subject: Kernel panic - not syncing : Fatal exception in interrupt |
|
|
hi,All,after i installed the base gentoo system(2006.0),i begin to emerge kde.
in this long time,an error happended when building kdelibs
the error message is as title
"Kernel panic - not syncing : Fatal exception in interrupt "
thank you _________________ Gentoo,OpenEmbedded,linux-vserver |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
didymos Advocate
![Advocate Advocate](/images/ranks/rank-G-1-advocate.gif)
![](images/avatars/1790706086435438446060f.jpg)
Joined: 10 Oct 2005 Posts: 4798 Location: California
|
Posted: Thu May 03, 2007 1:12 am Post subject: |
|
|
Was there no more to the message? Sounds like it might be a disk error, but without more info, I couldn't say. Have you tried to build kdelibs again? _________________ Thomas S. Howard |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
nixnut Bodhisattva
![Bodhisattva Bodhisattva](/images/ranks/rank-bodhisattva.gif)
![](images/avatars/29856733845fd04c0f3d8c.gif)
Joined: 09 Apr 2004 Posts: 10974 Location: the dutch mountains
|
Posted: Thu May 03, 2007 4:27 pm Post subject: |
|
|
merged above two posts here. _________________ Please add [solved] to the initial post's subject line if you feel your problem is resolved. Help answer the unanswered
talk is cheap. supply exceeds demand |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
|