View previous topic :: View next topic |
Author |
Message |
simon_irl Guru
Joined: 07 Oct 2004 Posts: 403 Location: New Zealand
|
Posted: Sun Aug 14, 2005 11:04 am Post subject: amd64 instability [solved] |
|
|
Just finished a year-long project, all done on a beautiful rock solid 32-bit Gentoo system (stable despite ~x86). Now that I have a few days to spare, decided to have another go at an amd64 install (gave up a year ago due to endless problems). After many days of struggling, am about ready to give up again...maybe even switch to another distro to take advantage of my 64 bit system (although that would be extremely painful, since after Gentoo other distros would hardly be Linux at all).
Problem is random crashes...first thought they were video related (dvd playback consistently locked up the system requiring reboot) then after "solving" this decided they were alsa related (since audio playback also caused lockups) then after many more new kernels etc. decided they were simply load-related (crashed during compile) and now (after a crash during boot process) have given up trying to work it out...hence this last-ditch plea for advice. I don't see how it could be heat/power related (since x86 setup is fine) so I'm assuming it's something I've done wrong in the kernel, but I've tried so many changes now that I need help...I've tried acpi off & on, pci hotplugging off & on, different i/o schedulers, etc. etc., always trying for stability over performance, but nothing stops the crashes.
I'm not using ~amd64 either...just the "stable" builds. Do I need to rebuild the whole system with CFLAGS="O2" instead of CFLAGS="O3"? Perhaps this is a stupid question (handbook gives O2 as an example, so I guess O3 involves some risk) but I had no problems with my 32 bit system so I'm reluctant to rebuild the whole 64 bit system with reduced optimisation if (like the many other things I've tried) it's only going to knock performance down another notch and still leave the problem unsolved. So basically I'm asking, does anyone know of a problematic kernel setting I may not have considered? Something that can totally lock things up (no keyboard response...nothing) even early in the boot process (the boot crash was almost instantly after grub fired up the kernel)? Or can someone with amd64 experience testify to the fact that CFLAGS="O3" is capable of doing such damage? Either way I'd be grateful for a little hope to continue trying...otherwise, it's back to the land of SuSE, Fedora et al that I will crawl.
The unstable 64-bit system (on another disk) is a stage 1 install with the latest stable everything. I tried GNOME, FVWM and KDE before realising that the instability didn't even need X to get me. It's been a long week!
Last edited by simon_irl on Mon Aug 15, 2005 12:24 pm; edited 1 time in total |
|
Back to top |
|
|
nxsty Veteran
Joined: 23 Jun 2004 Posts: 1556 Location: .se
|
Posted: Sun Aug 14, 2005 11:47 am Post subject: |
|
|
What kind of motherboard do you have? And what's your emerge --info? It's hard to tell what's wrong when you don't include any info about your system. |
|
Back to top |
|
|
simon_irl Guru
Joined: 07 Oct 2004 Posts: 403 Location: New Zealand
|
Posted: Sun Aug 14, 2005 12:04 pm Post subject: |
|
|
sorry (also apologies for posting on this forum...i didn't see the amd64 forum lower down the page...i've asked for the post to be moved). thanks for replying anyway.
my motherboard is a K8N-E deluxe.
emerge --info is as follows:
Code: | emerge --info
Portage 2.0.51.22-r2 (default-linux/amd64/2005.0, gcc-3.4.3, glibc-2.3.5-r0, 2.6 .12-gentoo-r6 x86_64)
=================================================================
System uname: 2.6.12-gentoo-r6 x86_64 AMD Athlon(tm) 64 Processor 3000+
Gentoo Base System version 1.6.13
dev-lang/python: 2.3.5, 2.4.1-r1
sys-apps/sandbox: 1.2.11
sys-devel/autoconf: 2.13, 2.59-r6
sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.5
sys-devel/binutils: 2.15.92.0.2-r10
sys-devel/libtool: 1.5.18-r1
virtual/os-headers: 2.6.11-r2
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O3 -march=athlon64 -m3dnow -mmmx -msse -msse2 -fomit-frame-pointer -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc/usr/kde/2/share/config /usr/kde/3/share/config /usr/lib/X1 1/xkb /usr/lib64/mozilla/defaults/pref /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/splash /etc/terminfo /etc/env.d"
CXXFLAGS="-O3 -march=athlon64 -m3dnow -mmmx -msse -msse2 -fomit-frame-pointer -p ipe"
DISTDIR="/extra/portage/distfiles"
FEATURES="autoconfig distlocks sandbox sfperms strict"
GENTOO_MIRRORS="ftp://mirror.pacific.net.au/linux/Gentoo"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/extra/portage/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.au.gentoo.org/gentoo-portage"
USE="X Xaw3d a52 aac alsa amd64 avi berkdb bitmap-fonts bonobo cdr crypt cups db us dv dvd dvdr dvdread eds emul-linux-x86 encode esd evo fbcon ffmpeg flac foomatic foomaticdb fortran ftp gif gnome gpm gstreamer gtk gtk2 gtkhtml hal ieee1394 imlib ipv6 jpeg lzw lzw-tiff mikmod mime mozilla mp3 mpeg ncurses nls nptl nptl only offensive ogg opengl pam pda pdflib perl png python quicktime readline scan ner sdl spell ssl tcpd tiff truetype truetype-fonts type1-fonts unicode usb user locales xine xml2 xmms xpm xv xvid yahoo zlib userland_GNU kernel_linux elibc_gl ibc"
Unset: ASFLAGS, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTDIR_OVERLAY
|
|
|
Back to top |
|
|
dsd Developer
Joined: 30 Mar 2003 Posts: 2162 Location: nr London
|
Posted: Sun Aug 14, 2005 12:49 pm Post subject: |
|
|
you should test your memory with memtest and report the crashes you get as bugs (assuming memory is ok). you should also try 2.6.13-rc6 as there are some important x86_64 fixes included
you might want to try an "emerge mono" and see if you hit the same problem as in bug 101807. this will confirm if you are seeing the pagetable fishyness as well... _________________ http://dev.gentoo.org/~dsd |
|
Back to top |
|
|
joaander Tux's lil' helper
Joined: 30 Apr 2004 Posts: 132
|
Posted: Sun Aug 14, 2005 2:24 pm Post subject: |
|
|
Also, dig around a bit in the amd64 forum. There are a lot of other people with the similar problems. Some culprits have turned up: not a big enough power supply, nvidia drivers, the most current: a bug in kernel 2.6.12 (there is a sticky on this one with a workaround) and many others. |
|
Back to top |
|
|
simon_irl Guru
Joined: 07 Oct 2004 Posts: 403 Location: New Zealand
|
Posted: Sun Aug 14, 2005 9:12 pm Post subject: |
|
|
Memtest was fine. I thought I'd searched the forums reasonably carefully, but obviously not...I didn't know about the kernel bug, and haven't tried 2.6.13-rc6 yet. Thanks for the pointers...if I have any success I'll come back, mark this as solved and post a link to the solution. |
|
Back to top |
|
|
simon_irl Guru
Joined: 07 Oct 2004 Posts: 403 Location: New Zealand
|
Posted: Mon Aug 15, 2005 12:26 pm Post subject: |
|
|
yep, it was the kernel...downgraded to 2.6.9-gentoo-r9 and no more crashes. thanks. |
|
Back to top |
|
|
dsd Developer
Joined: 30 Mar 2003 Posts: 2162 Location: nr London
|
Posted: Mon Aug 15, 2005 7:43 pm Post subject: |
|
|
how about you upgrade to 2.6.13-rc and help us debug the problems, rather than being eternally doomed to 2.6.9? _________________ http://dev.gentoo.org/~dsd |
|
Back to top |
|
|
simon_irl Guru
Joined: 07 Oct 2004 Posts: 403 Location: New Zealand
|
Posted: Tue Aug 16, 2005 1:26 am Post subject: |
|
|
i will as soon as everything else is fully set up and working...i like to trust my kernel when i'm installing other stuff. once i know the rest is ok, i'll try 2.6.13-rc. |
|
Back to top |
|
|
pinr Apprentice
Joined: 26 Jan 2003 Posts: 241 Location: Monterrey, Mexico
|
Posted: Mon Oct 31, 2005 3:22 am Post subject: |
|
|
I have also have a Gigabyte MB mine being the K8NS since buying this motherboard back in May I have had similiar lockups with all Kernels from 2.6.10 to 2.6.13-rxx My computer never locked during boot only when using x-windows. Sometimes it would stay up for over 100 hours but on other occasions it would lock up 2 or 3 times a day. During these lockups the only option was a hard boot nothing else worked and remote login was not possible. After reading this thread I downgraded to the 2.6.9 kernel and have not had a similar lockup since. However, I'm now experiencing a different type of lock up. This lockup happens in X and it has happened twice both times while I've been away from the machine, that is I notice it when I come back. With this new lockup I can still move the mouse, but nothing else works and I can't switch to console. I can get a remote login and top tells me X is using 99.9% of the cpu. If I killall -9 X then the remote connection freezes and I can do nothing else with the machine. Any suggestions if I could unfreeze the machine from the remote connection I'd be reasonably happy what I really hate is having to hit the power switch |
|
Back to top |
|
|
durian Guru
Joined: 16 Jul 2003 Posts: 312 Location: Margretetorp
|
Posted: Mon Oct 31, 2005 7:36 am Post subject: |
|
|
pinr wrote: | I have also have a Gigabyte MB mine being the K8NS since buying this motherboard back in May I have had similiar lockups with all Kernels from 2.6.10 to 2.6.13-rxx My computer never locked during boot only when using x-windows. Sometimes it would stay up for over 100 hours but on other occasions it would lock up 2 or 3 times a day. During these lockups the only option was a hard boot nothing else worked and remote login was not possible. After reading this thread I downgraded to the 2.6.9 kernel and have not had a similar lockup since. However, I'm now experiencing a different type of lock up. This lockup happens in X and it has happened twice both times while I've been away from the machine, that is I notice it when I come back. With this new lockup I can still move the mouse, but nothing else works and I can't switch to console. I can get a remote login and top tells me X is using 99.9% of the cpu. If I killall -9 X then the remote connection freezes and I can do nothing else with the machine. Any suggestions if I could unfreeze the machine from the remote connection I'd be reasonably happy what I really hate is having to hit the power switch |
Just for the record, I have the same mobo, with a Sempron64 2800+ proc. I run kernel 2.6.12-gentoo-r6, never had a lock-up (knocks on wood). I don't run X though...
About the X lockup, maybe you should just kill the X process, not a killall? Is it not even possible to login on the machine with ssh after killing X?
-peter |
|
Back to top |
|
|
pinr Apprentice
Joined: 26 Jan 2003 Posts: 241 Location: Monterrey, Mexico
|
Posted: Mon Oct 31, 2005 1:25 pm Post subject: |
|
|
Thanks for the replay durian. Interesting that you've never had a lockup, would you mind sending me a copy of your kernel config file so I can compare I don't think the orignal lock-ups were x-related as they happened with both the nv and nvidia drivers. Also next time I get the lockup I'll try kill X as you suggested. No I can't log on via ssh once I've killed X I get the connection but the console just hangs. |
|
Back to top |
|
|
durian Guru
Joined: 16 Jul 2003 Posts: 312 Location: Margretetorp
|
Posted: Tue Nov 01, 2005 8:22 am Post subject: |
|
|
pinr wrote: | Thanks for the replay durian. Interesting that you've never had a lockup, would you mind sending me a copy of your kernel config file so I can compare I don't think the orignal lock-ups were x-related as they happened with both the nv and nvidia drivers. Also next time I get the lockup I'll try kill X as you suggested. No I can't log on via ssh once I've killed X I get the connection but the console just hangs. |
Hi,
I use genkernel, so I guess it is totally standard because I never did the (x)config step on this machine! Lazy :-)
So I'm not sure if that would be useful. But you can of course have it if you want,
-peter |
|
Back to top |
|
|
pinr Apprentice
Joined: 26 Jan 2003 Posts: 241 Location: Monterrey, Mexico
|
Posted: Tue Nov 08, 2005 10:43 pm Post subject: |
|
|
Yippee looks like its finally fixed with kernel 2.6.14 current uptime 149:27 and no lockups! |
|
Back to top |
|
|
|