Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
kernel panic on emerge [URGENT]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
hanj
Veteran
Veteran


Joined: 19 Aug 2003
Posts: 1500

PostPosted: Wed Aug 16, 2006 3:15 pm    Post subject: kernel panic on emerge [URGENT] Reply with quote

Hello All

The last few days I've been experiencing this problem during emerges. When I emerge a package (again this is intermittent) and around md5 check and unpack.. it freezes. Viewing the console.. I get this message

Code:
...
[<c016600f>]
[<c0127a29>]
Code:  Bad EIP value.
 <0>Kernel panic - not syncing: Fatal exception in interrupt


At first I thought it was a recent change in the kernel I made. I added SMP support in there. Yesterday, I built a new kernel and removed that feature set. Today it happened again. I just rolled back to an older kernel this morning.. and have not seen any problems as of yet.

My quesion.. could this be kernel vesion, config problem.. or am I looking at a hardware problem?

The kernel config has been the same for about 2 years with the exception of SMP which I just added and removed. I feel like the config should be 'good'.

Here is my relevent info.

The kernel version that I was running was..
Code:
2.6.16-hardened-r11


I just rolled back to
Code:
2.6.16-hardened-r10


Code:
emerge --info
Portage 2.1-r2 (default-linux/x86/2006.0, gcc-3.4.6, glibc-2.3.6-r4, 2.6.16-hardened-r10 i686)
=================================================================
System uname: 2.6.16-hardened-r10 i686 Intel(R) Pentium(R) 4 CPU 2.80GHz
Gentoo Base System version 1.12.4
app-admin/eselect-compiler: [Not Present]
dev-lang/python:     2.3.5-r2, 2.4.3-r1
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     [Not Present]
dev-util/confcache:  [Not Present]
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.59-r7
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.13-r3
sys-devel/libtool:   1.4.3-r4, 1.5.22
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=pentium4 -funroll-loops -fprefetch-loop-arrays -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /var/bind"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo"
CXXFLAGS="-O2 -march=pentium4 -funroll-loops -fprefetch-loop-arrays -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="x86 alsa apache2 berkdb bitmap-fonts cli crypt dlloader dri eds emboss esd foomaticdb fortran gdbm gif gstreamer imlib isdnlog jpeg libg++ libwww mp3 ncurses nptl ogg pam pcre pdflib perl php png pppd python qt3 qt4 readline reflection sasl session snortsam spell spl ssl tcpd truetype-fonts type1-fonts udev vorbis xml xorg zlib elibc_glibc input_devices_keyboard input_devices_mouse input_devices_evdev kernel_linux userland_GNU"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_RSYNC_EXTRA_OPTS


Code:
cat /proc/meminfo
MemTotal:       767020 kB
MemFree:        672240 kB
Buffers:          9320 kB
Cached:          45536 kB
SwapCached:          0 kB
Active:          50824 kB
Inactive:        28756 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       767020 kB
LowFree:        672240 kB
SwapTotal:      979956 kB
SwapFree:       979956 kB
Dirty:               0 kB
Writeback:           0 kB
Mapped:          39844 kB
Slab:            10444 kB
CommitLimit:   1363464 kB
Committed_AS:   210700 kB
PageTables:        936 kB
VmallocTotal:   253944 kB
VmallocUsed:      2100 kB
VmallocChunk:   251844 kB


Code:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping        : 1
cpu MHz         : 2793.507
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 3
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc pni monitor ds_cpl cid xtpr
bogomips        : 5597.60




[update].. just happened again with 2.6.16-hardened-r10!!! I know I've been running with this kernel for awhile in the past without any problems.

This is the as far as I get on emerge...
Code:
emerge -v baselayout
Calculating dependencies... done!
>>> Emerging (1 of 1) sys-apps/baselayout-1.12.4-r6 to /
>>> checking ebuild checksums ;-)
>>> checking auxfile checksums ;-)
>>> checking miscfile checksums ;-)
>>> checking baselayout-1.12.4.tar.bz2 ;-)



Let me know if I can supply any additional information

Thanks!
hanji
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54824
Location: 56N 3W

PostPosted: Wed Aug 16, 2006 3:26 pm    Post subject: Reply with quote

hanj,

Intermittent problems like this are normally indicative of a hardware problem.
The unpack is very CPU and RAM intensive, so the CPU temerature rises.

Check your cooling - use a stiff brush to clean your fan/heatsink assembly.
Run lm-sensors to keep and eye on temperatues (there are other ways)
Run memtest86 from the liveCD.

If that doesn't help, its time to begin removing things.
Operate with one stick of RAM, test each one individually this way.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
hanj
Veteran
Veteran


Joined: 19 Aug 2003
Posts: 1500

PostPosted: Wed Aug 16, 2006 3:34 pm    Post subject: Reply with quote

So no easy way to narrow it down to CPU or RAM problem? Could this be drive as well? This is one of my development servers, so I really can't have it down for hours doing memtest, etc... but if I have to.. I have to.

I'll try to get lm_sensors on there.. I don't think the board supports it very well. It's a Dell PowerEdge SC420.

Any tips to help narrow down the problem. I looked at the fan.. it looks good and clean.

Thanks for the reply
hanji
Back to top
View user's profile Send private message
hanj
Veteran
Veteran


Joined: 19 Aug 2003
Posts: 1500

PostPosted: Wed Aug 16, 2006 3:48 pm    Post subject: Reply with quote

NeddySeagoon wrote:
Intermittent problems like this are normally indicative of a hardware problem.
The unpack is very CPU and RAM intensive, so the CPU temerature rises.


It's odd that I can compile a kernel though.. seems like that should be torkin' on the CPU/RAM harder than an unpack. I would assume that it's pushing RAM.. right? Any quick way to push on RAM?

Thanks again.. I'm just thinking out loud.

hanji
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54824
Location: 56N 3W

PostPosted: Wed Aug 16, 2006 4:18 pm    Post subject: Reply with quote

hanj,

Since you can compile a kernel, its probably not the CPU. Kernel compilies do not push RAM, its lots of small files, unlike unpack but that will use RAM for data, rather than program code and your crash comes from program code - executing data normally leads to different errors.

I doubt its a drive, or a data cable. reading corrupt data from the disk surface would be detected by the drives CRC checks.
Getting program code into RAM that was written incorrectly would be repeatable - it would be wrong every time.

You can run memtest86 without using the liveCD but its not very useful. It can get swapped out and moved around in physical RAM, also, it can't test all of RAM because some things can't be swapped/moved.

Reduce the box to one stick of memory - see if it still happens. Try each stick in turn, on its own.
Pulling a memory stick and seeing the problem go away does not imply the stick you pulled is faulty.
Disturbing the contacts between the RAM and montherboard socket can fix it too.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
hanj
Veteran
Veteran


Joined: 19 Aug 2003
Posts: 1500

PostPosted: Wed Aug 16, 2006 4:31 pm    Post subject: Reply with quote

NeddySeagoon wrote:
You can run memtest86 without using the liveCD but its not very useful. It can get swapped out and moved around in physical RAM, also, it can't test all of RAM because some things can't be swapped/moved.

Reduce the box to one stick of memory - see if it still happens. Try each stick in turn, on its own.
Pulling a memory stick and seeing the problem go away does not imply the stick you pulled is faulty.
Disturbing the contacts between the RAM and montherboard socket can fix it too.


Thanks again. Seems like we're pointing to RAM initially then. I think I'll try some tar'ing/untar'ing to see if I can reproduce this. Then jump to removing RAM.. if still nothing, I'll crank on memtest tonight with all the sticks in there.

You mentioned CRC checks.. is there any way I can verify any CRC errors. hdparm?? I'm not too familiar with the process of testing if the drive is okay.

Thanks for the help..it's much appreciated.

hanji
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54824
Location: 56N 3W

PostPosted: Wed Aug 16, 2006 6:18 pm    Post subject: Reply with quote

hanj,

The CRC checks are all internel to the drive - its more complex than just CRCs because the drive is able to recover so many bad bits in a sector too.

If you have an IDE drive, the easiest way to see if its ok is to ask it with smartmontools.
SATA drives support SMART too but libsata in the kernel doesn't yet. There is a patch but I don't think its in the vanillia kernel yet.

All modern drives hide bad sectors from the OS by 'on the fly' remapping to spares, so you should never see a bad sector until the drive is end of life. You can dd the entire drive to /dev/null to read the surface. It will stop on errors.

I think its either memory, PSU (as in the PSU box) or the Vcore PSU on the motherboard.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
troymc
Guru
Guru


Joined: 22 Mar 2006
Posts: 553

PostPosted: Wed Aug 16, 2006 9:36 pm    Post subject: Reply with quote

You can try enabling Machine Check Exception support for you processor in your kernel. The hardware tracks it's own errors in a special log. This will catch cpu, memory and other mainboard-related errors.

Once you have support enabled in the kernel, you will see messages in /var/log/messages stating something like "a machine check exception was logged" if your hardware detects an error. You can install app-admin/mcelog to view these errors.



troymc
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54824
Location: 56N 3W

PostPosted: Wed Aug 16, 2006 10:38 pm    Post subject: Reply with quote

troymc,

Thats worth a try, however many hardware errors prevent the log being written, so no errors in the log does not mean the hardware is ok.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
hanj
Veteran
Veteran


Joined: 19 Aug 2003
Posts: 1500

PostPosted: Fri Aug 18, 2006 2:39 pm    Post subject: Reply with quote

Hello

I received a different error today.. just wanted to post it in case it provides an additional clue...

Code:
Unable to handle kernel NULL pointer dereference at virtual address 00000078
 printing eip:
c0127f54
*pdg =    0
*pmd =   0
Recursive die() failure, output supressed
 <0>Kernel panic - not synching: Fatal exception in interrupt


This time it happened at the configure stage when it was checking for compile options. I blew out the computer real good the other day and things weren't giving me a problem until this morning. I'll pop out a stick RAM today, and start that test. I want to do one thing at a time.

Quote:
You can try enabling Machine Check Exception support for you processor in your kernel. The hardware tracks it's own errors in a special log. This will catch cpu, memory and other mainboard-related errors.


I'll also do this too.

Thanks everyone.

hanji
Back to top
View user's profile Send private message
hanj
Veteran
Veteran


Joined: 19 Aug 2003
Posts: 1500

PostPosted: Fri Aug 18, 2006 2:46 pm    Post subject: Reply with quote

troymc wrote:
You can try enabling Machine Check Exception support for you processor in your kernel. The hardware tracks it's own errors in a special log. This will catch cpu, memory and other mainboard-related errors.

Once you have support enabled in the kernel, you will see messages in /var/log/messages stating something like "a machine check exception was logged" if your hardware detects an error. You can install app-admin/mcelog to view these errors.



troymc


hmm. I just checked my kernel config, and I already had that built in. I grep'd the logs.. and I just see it 'enabled' on the CPU, but no errors.

Code:
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0


Bummer.
hanji
Back to top
View user's profile Send private message
hanj
Veteran
Veteran


Joined: 19 Aug 2003
Posts: 1500

PostPosted: Mon Aug 21, 2006 3:18 am    Post subject: Reply with quote

Hello

36 hours and 114 pass w/no errors, I had to shut off memtest. Does this mean that memory is okay.. or do I have to completely finish the test? I can't believe it's taking this long?

I'm going to reseating RAM next

Thanks!
hanji
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Mon Aug 21, 2006 3:22 am    Post subject: Reply with quote

Quote:
or do I have to completely finish the test?


It is a continuous test so it ends when you feel that you have waited long enough...
_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
mope
Apprentice
Apprentice


Joined: 23 Feb 2003
Posts: 206

PostPosted: Wed Nov 01, 2006 5:53 pm    Post subject: Reply with quote

Did you ever pinppoint the problem?

I'm getting the same thing on my presario 1710nx.
I'm on 2.6.18-r1 kernel.

I'll check memtest this morning and report back this afternoon, but maybe it's due to heat and the stock heatsink/fan?
Back to top
View user's profile Send private message
lynnlinux
n00b
n00b


Joined: 03 Mar 2007
Posts: 39
Location: Shanghai

PostPosted: Wed May 02, 2007 11:48 pm    Post subject: Kernel panic - not syncing : Fatal exception in interrupt Reply with quote

hi,All,after i installed the base gentoo system(2006.0),i begin to emerge kde.
in this long time,an error happended when building kdelibs
the error message is as title
"Kernel panic - not syncing : Fatal exception in interrupt "

thank you
_________________
Gentoo,OpenEmbedded,linux-vserver
Back to top
View user's profile Send private message
didymos
Advocate
Advocate


Joined: 10 Oct 2005
Posts: 4798
Location: California

PostPosted: Thu May 03, 2007 1:12 am    Post subject: Reply with quote

Was there no more to the message? Sounds like it might be a disk error, but without more info, I couldn't say. Have you tried to build kdelibs again?
_________________
Thomas S. Howard
Back to top
View user's profile Send private message
nixnut
Bodhisattva
Bodhisattva


Joined: 09 Apr 2004
Posts: 10974
Location: the dutch mountains

PostPosted: Thu May 03, 2007 4:27 pm    Post subject: Reply with quote

merged above two posts here.
_________________
Please add [solved] to the initial post's subject line if you feel your problem is resolved. Help answer the unanswered

talk is cheap. supply exceeds demand
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum