View previous topic :: View next topic |
Author |
Message |
rmrfslashstar n00b
Joined: 06 Jul 2006 Posts: 8
|
Posted: Wed Jul 19, 2006 6:57 pm Post subject: emerge xorg-server-1.1.0-r1 hard-locks alpha system (FIXED) |
|
|
I followed the Migrating to Modular X Howto but I can't get past the xorg-server-1.1.0-r1 emerge. I was able to emerge all the packages beforehand however. I start the emerge and everything goes fine until the system becomes unresponsive. It stops responding to ssh and I have to manually reboot it. The same thing has happened now 5-6 times and I don't want to push it when it comes to hard-locking the system. The same situation happened once before when emerging kdepim-3.5.2-r2, however this got "pushed down" when the Modular X came out so it is not an issue at the moment.
I'm pretty sure /var (portage work directory is the default) overflowing is not the problem, since after I reboot /var is only 12% full with 870MB available.
It *could* but due to flaky hardware, however I don't think this is the problem since I only have had problems emerging 2 packages, namely xorg-x11 and kdepim. Here is what /var/log/messages looks like before a crash, there's a correctable ECC error and a kernel debug message (which I believe should be harmless). The ECC message is uncommon while the debug message pops up more frequently. Doing stressful tasks like compiling makes them crop up more often.
Code: | Jul 17 14:30:10 snorre -- MARK --
Jul 17 14:40:10 snorre -- MARK --
Jul 17 14:50:10 snorre -- MARK --
Jul 17 14:58:09 snorre kernel: TSUNAMI machine check: vector=0x630 pc=0x20000174bf8 code=0x
86
Jul 17 14:58:09 snorre kernel: machine check type: correctable ECC error (retryable)
Jul 17 14:58:09 snorre kernel: pc = [<0000020000174bf8>] ra = [<0000000120058e18>] ps = 0
008 Not tainted
Jul 17 14:58:09 snorre kernel: pc is at 0x20000174bf8
Jul 17 14:58:09 snorre kernel: ra is at 0x120058e18
Jul 17 14:58:09 snorre kernel: v0 = 0000000000000000 t0 = 00000001201956a0 t1 = 000000012
01976f0
Jul 17 14:58:09 snorre kernel: t2 = 0000000000000000 t3 = 0000000000000000 t4 = 000000000
0000000
Jul 17 14:58:09 snorre kernel: t5 = 00000000000000c0 t6 = 000000000000003f t7 = 000000000
000007f
Jul 17 14:58:09 snorre kernel: a0 = 00000001201956a0 a1 = 0000000000000000 a2 = 000000000
0000001
Jul 17 14:58:09 snorre kernel: a3 = 0000000000000000 a4 = 0000000000001bcb a5 = 000000011
fe607bc
Jul 17 14:58:09 snorre kernel: t8 = 0000000000000040 t9 = 000002000017a298 t10= 000000000
0000040
Jul 17 14:58:09 snorre kernel: t11= 000000000000000a pv = 0000020000174bf0 at = 000002000
026a8b0
Jul 17 14:58:09 snorre kernel: gp = 000002000026df70 sp = fffffc0023c84000
Jul 17 16:48:23 snorre syslogd 1.4.1: restart. |
Everytime /var/log/emerge.log shows the same message:
Code: | 1152570311: >>> emerge (1 of 5) x11-base/xorg-server-1.1.0-r1 to /
1152570311: === (1 of 5) Cleaning (x11-base/xorg-server-1.1.0-r1::/usr/portage/x11-base/xorg-server/xorg-server-1.1.0-r1.ebuild)
1152570322: === (1 of 5) Compiling/Merging (x11-base/xorg-server-1.1.0-r1::/usr/portage/x11-base/xorg-server/xorg-server-1.1.0-r1.ebuild) |
Here is the emerge that crashes my system:
Code: | emerge -DuNva xorg-x11
These are the packages that would be merged, in order:
Calculating dependencies... done!
[ebuild N ] x11-base/xorg-server-1.1.0-r1 USE="dri ipv6 sdl xorg -debug -dmx -kdrive -minimal -nptl -xprint" INPUT_DEVICES="keyboard mouse -acecad -aiptek -calcomp -citron -digitaledge -dmc -dynapro -elo2300 -elographics -evdev -fpit -hyperpen -jamstudio -joystick -magellan -microtouch -mutouch -palmax -penmount -spaceorb -summa -tek4957 -ur98 -void" VIDEO_CARDS="vga -cirrus -dummy -epson -fbdev -glint -mach64 -mga -nv -r128 -radeon -rendition -s3 -s3virge -savage -siliconmotion -sisusb -tdfx -tga -v4l -voodoo" 0 kB
[ebuild N ] x11-drivers/xf86-video-vga-4.1.0 USE="-debug" 228 kB
[ebuild N ] x11-drivers/xf86-input-mouse-1.1.1 USE="-debug" 261 kB
[ebuild N ] x11-drivers/xf86-input-keyboard-1.1.0 USE="-debug" 226 kB
[ebuild N ] x11-libs/libXv-1.0.1 USE="-debug" 219 kB
[ebuild N ] x11-base/xorg-x11-7.1 0 kB
Total size of downloads: 935 kB |
My make.conf
Code: | # These settings were set by the catalyst build script that automatically built this stage
# Please consult /etc/make.conf.example for a more detailed example
CFLAGS="-mieee -O2 -mcpu=ev56 -pipe"
CHOST="alpha-unknown-linux-gnu"
CXXFLAGS="${CFLAGS}"
MAKEOPTS="-j2"
USE="kde qt i128 -gnome gtk -alsa"
GENTOO_MIRRORS="http://distro.ibiblio.org/pub/linux/distributions/gentoo/ ftp://ftp.gtli
b.cc.gatech.edu/pub/gentoo ftp://mirror.iawnet.sandia.gov/pub/gentoo/"
SYNC="rsync://rsync.us.gentoo.org/gentoo-portage"
INPUT_DEVICES="keyboard mouse"
VIDEO_CARDS="vga" |
Does anyone have any suggestions for how to go about troubleshooting this problem? For example ways I can get more information about what is going wrong? Or is it possible to emerge xorg with less flags, making the compile less strenuous? Anything is appreciated! Thanks.
Last edited by rmrfslashstar on Fri Aug 18, 2006 9:04 pm; edited 1 time in total |
|
Back to top |
|
|
mark_alec Bodhisattva
Joined: 11 Sep 2004 Posts: 6066 Location: Melbourne, Australia
|
Posted: Thu Jul 20, 2006 9:52 am Post subject: |
|
|
Moved from Portage & Programming to Gentoo on Alternative Architectures. |
|
Back to top |
|
|
jeffd n00b
Joined: 22 Apr 2004 Posts: 22 Location: Mass
|
Posted: Fri Jul 21, 2006 2:34 am Post subject: could be hardware |
|
|
I don't know for sure what your problem is, but I had very similar symptoms on one of my Alpha's when one of my SCSI drives started going bad... bad blocks on the disk. Could your swap partition be developing bad blocks? That would explain why this only happens when emerging large packages..... the bad blocks on your swap partition don't get touched until you really start paging. Like I said, I'm not sure this is your problem, but it is possible. |
|
Back to top |
|
|
rmrfslashstar n00b
Joined: 06 Jul 2006 Posts: 8
|
Posted: Fri Jul 21, 2006 6:49 pm Post subject: |
|
|
Thanks for the reply, that would make sense. I remember having a problem with one of my reiserfs partitions before (not swap, and was apparently fixable) so it could be the harddrive.
How would I go about checking the swap partition for bad blocks though? I can't seem to find out how on google but I'll keep looking.
Edit: is the solution to init 1 to single-user and then mkswap -c /dev/sdaXX ? |
|
Back to top |
|
|
Qdot Tux's lil' helper
Joined: 06 Jan 2005 Posts: 127
|
Posted: Sat Jul 22, 2006 6:06 pm Post subject: |
|
|
rmrfslashstar wrote: |
How would I go about checking the swap partition for bad blocks though? I can't seem to find out how on google but I'll keep looking.
Edit: is the solution to init 1 to single-user and then mkswap -c /dev/sdaXX ? |
I'll just
Code: | swapoff -a && badblocks -wvv /dev/sdax | (it will issue a *destructive* RW test of the partition, so mkswap it later) |
|
Back to top |
|
|
rmrfslashstar n00b
Joined: 06 Jul 2006 Posts: 8
|
Posted: Mon Jul 24, 2006 10:40 pm Post subject: |
|
|
Thanks! Ok, so I have 3 swap partitions, /dev/sda1, /dev/sdb1, /dev/sdc1. I did
Code: | swapoff /dev/sda1
badblocks -wvv /dev/sda1 |
Passed with 0 bad blocks. Then to remake my swap I did
Code: | mkswap -c /dev/sda1
swapon -p 1 /dev/sda1 |
Rinsed & repeated for /dev/sdb1 and /dev/sdc1: passed each time with 0 bad blocks.
I also did a reiserfsck --check /dev/sda5 (after umount in single-user) my /var partition and it passed with no errors. This is the partition that gave me trouble last December, however a rebuild-tree appears to have fixed the problem...
Any other ideas? In the mean time I will try setting PORTAGE_TMPDIR to a directory in /usr/local and see if that helps. |
|
Back to top |
|
|
rmrfslashstar n00b
Joined: 06 Jul 2006 Posts: 8
|
Posted: Fri Jul 28, 2006 8:37 pm Post subject: |
|
|
Ok, I tried setting PORTAGE_TMPDIR=/usr/local/portage_tmpdir however the emerge still failed. Afterwards I did a reiserfsck --check of the /usr/local partition as well and it passed with no problems. This leads me to believe it's not a harddisk problem... are there any ways I can force emerge to give me more info. on whats going wrong? The hard-lock occurs somewhere in the compile but I'm not sure if it's the same place every time... |
|
Back to top |
|
|
rmrfslashstar n00b
Joined: 06 Jul 2006 Posts: 8
|
Posted: Tue Aug 08, 2006 5:02 pm Post subject: |
|
|
I apologize for continuing to bump my own thread
I ran a test on the physical memory and it seems OK.
I couldn't figure out how to get SRM memtest to run but I downloaded memtester 4.0.3 and ran memtester 900 3 and it passed all the tests on all 3 iterations (using 900 Mb out of 1024 total). It looks like I can push it to 950, which I'll try now, but above that memtester crashes.
If anyone has experience with SRM memtest I'd be willing to give it another go, but for the life of me I couldn't get past the "invalid zone" error.
Update: it also passed memtester 950 3.
Edit:
It appears the problem was that the system was overheating. When I took off the case the emerge succeeded. Thanks for the replies - it looks like this issue is resolved for now. |
|
Back to top |
|
|
|