Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
amd64 instability [solved]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64
View previous topic :: View next topic  
Author Message
simon_irl
Guru
Guru


Joined: 07 Oct 2004
Posts: 403
Location: New Zealand

PostPosted: Sun Aug 14, 2005 11:04 am    Post subject: amd64 instability [solved] Reply with quote

Just finished a year-long project, all done on a beautiful rock solid 32-bit Gentoo system (stable despite ~x86). Now that I have a few days to spare, decided to have another go at an amd64 install (gave up a year ago due to endless problems). After many days of struggling, am about ready to give up again...maybe even switch to another distro to take advantage of my 64 bit system (although that would be extremely painful, since after Gentoo other distros would hardly be Linux at all).

Problem is random crashes...first thought they were video related (dvd playback consistently locked up the system requiring reboot) then after "solving" this decided they were alsa related (since audio playback also caused lockups) then after many more new kernels etc. decided they were simply load-related (crashed during compile) and now (after a crash during boot process) have given up trying to work it out...hence this last-ditch plea for advice. I don't see how it could be heat/power related (since x86 setup is fine) so I'm assuming it's something I've done wrong in the kernel, but I've tried so many changes now that I need help...I've tried acpi off & on, pci hotplugging off & on, different i/o schedulers, etc. etc., always trying for stability over performance, but nothing stops the crashes.

I'm not using ~amd64 either...just the "stable" builds. Do I need to rebuild the whole system with CFLAGS="O2" instead of CFLAGS="O3"? Perhaps this is a stupid question (handbook gives O2 as an example, so I guess O3 involves some risk) but I had no problems with my 32 bit system so I'm reluctant to rebuild the whole 64 bit system with reduced optimisation if (like the many other things I've tried) it's only going to knock performance down another notch and still leave the problem unsolved. So basically I'm asking, does anyone know of a problematic kernel setting I may not have considered? Something that can totally lock things up (no keyboard response...nothing) even early in the boot process (the boot crash was almost instantly after grub fired up the kernel)? Or can someone with amd64 experience testify to the fact that CFLAGS="O3" is capable of doing such damage? Either way I'd be grateful for a little hope to continue trying...otherwise, it's back to the land of SuSE, Fedora et al that I will crawl.

The unstable 64-bit system (on another disk) is a stage 1 install with the latest stable everything. I tried GNOME, FVWM and KDE before realising that the instability didn't even need X to get me. It's been a long week!


Last edited by simon_irl on Mon Aug 15, 2005 12:24 pm; edited 1 time in total
Back to top
View user's profile Send private message
nxsty
Veteran
Veteran


Joined: 23 Jun 2004
Posts: 1556
Location: .se

PostPosted: Sun Aug 14, 2005 11:47 am    Post subject: Reply with quote

What kind of motherboard do you have? And what's your emerge --info? It's hard to tell what's wrong when you don't include any info about your system.
Back to top
View user's profile Send private message
simon_irl
Guru
Guru


Joined: 07 Oct 2004
Posts: 403
Location: New Zealand

PostPosted: Sun Aug 14, 2005 12:04 pm    Post subject: Reply with quote

sorry (also apologies for posting on this forum...i didn't see the amd64 forum lower down the page...i've asked for the post to be moved). thanks for replying anyway.

my motherboard is a K8N-E deluxe.

emerge --info is as follows:

Code:
emerge --info
Portage 2.0.51.22-r2 (default-linux/amd64/2005.0, gcc-3.4.3, glibc-2.3.5-r0, 2.6 .12-gentoo-r6 x86_64)
=================================================================
System uname: 2.6.12-gentoo-r6 x86_64 AMD Athlon(tm) 64 Processor 3000+
Gentoo Base System version 1.6.13
dev-lang/python:     2.3.5, 2.4.1-r1
sys-apps/sandbox:    1.2.11
sys-devel/autoconf:  2.13, 2.59-r6
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.5
sys-devel/binutils:  2.15.92.0.2-r10
sys-devel/libtool:   1.5.18-r1
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O3 -march=athlon64 -m3dnow -mmmx -msse -msse2 -fomit-frame-pointer -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc/usr/kde/2/share/config /usr/kde/3/share/config /usr/lib/X1 1/xkb /usr/lib64/mozilla/defaults/pref /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/splash /etc/terminfo /etc/env.d"
CXXFLAGS="-O3 -march=athlon64 -m3dnow -mmmx -msse -msse2 -fomit-frame-pointer -p ipe"
DISTDIR="/extra/portage/distfiles"
FEATURES="autoconfig distlocks sandbox sfperms strict"
GENTOO_MIRRORS="ftp://mirror.pacific.net.au/linux/Gentoo"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/extra/portage/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.au.gentoo.org/gentoo-portage"
USE="X Xaw3d a52 aac alsa amd64 avi berkdb bitmap-fonts bonobo cdr crypt cups db us dv dvd dvdr dvdread eds emul-linux-x86 encode esd evo fbcon ffmpeg flac foomatic foomaticdb fortran ftp gif gnome gpm gstreamer gtk gtk2 gtkhtml hal ieee1394  imlib ipv6 jpeg lzw lzw-tiff mikmod mime mozilla mp3 mpeg ncurses nls nptl nptl only offensive ogg opengl pam pda pdflib perl png python quicktime readline scan ner sdl spell ssl tcpd tiff truetype truetype-fonts type1-fonts unicode usb user locales xine xml2 xmms xpm xv xvid yahoo zlib userland_GNU kernel_linux elibc_gl ibc"
Unset:  ASFLAGS, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTDIR_OVERLAY
Back to top
View user's profile Send private message
dsd
Developer
Developer


Joined: 30 Mar 2003
Posts: 2162
Location: nr London

PostPosted: Sun Aug 14, 2005 12:49 pm    Post subject: Reply with quote

you should test your memory with memtest and report the crashes you get as bugs (assuming memory is ok). you should also try 2.6.13-rc6 as there are some important x86_64 fixes included

you might want to try an "emerge mono" and see if you hit the same problem as in bug 101807. this will confirm if you are seeing the pagetable fishyness as well...
_________________
http://dev.gentoo.org/~dsd
Back to top
View user's profile Send private message
joaander
Tux's lil' helper
Tux's lil' helper


Joined: 30 Apr 2004
Posts: 132

PostPosted: Sun Aug 14, 2005 2:24 pm    Post subject: Reply with quote

Also, dig around a bit in the amd64 forum. There are a lot of other people with the similar problems. Some culprits have turned up: not a big enough power supply, nvidia drivers, the most current: a bug in kernel 2.6.12 (there is a sticky on this one with a workaround) and many others.
Back to top
View user's profile Send private message
simon_irl
Guru
Guru


Joined: 07 Oct 2004
Posts: 403
Location: New Zealand

PostPosted: Sun Aug 14, 2005 9:12 pm    Post subject: Reply with quote

Memtest was fine. I thought I'd searched the forums reasonably carefully, but obviously not...I didn't know about the kernel bug, and haven't tried 2.6.13-rc6 yet. Thanks for the pointers...if I have any success I'll come back, mark this as solved and post a link to the solution.
Back to top
View user's profile Send private message
simon_irl
Guru
Guru


Joined: 07 Oct 2004
Posts: 403
Location: New Zealand

PostPosted: Mon Aug 15, 2005 12:26 pm    Post subject: Reply with quote

yep, it was the kernel...downgraded to 2.6.9-gentoo-r9 and no more crashes. thanks.
Back to top
View user's profile Send private message
dsd
Developer
Developer


Joined: 30 Mar 2003
Posts: 2162
Location: nr London

PostPosted: Mon Aug 15, 2005 7:43 pm    Post subject: Reply with quote

how about you upgrade to 2.6.13-rc and help us debug the problems, rather than being eternally doomed to 2.6.9? ;)
_________________
http://dev.gentoo.org/~dsd
Back to top
View user's profile Send private message
simon_irl
Guru
Guru


Joined: 07 Oct 2004
Posts: 403
Location: New Zealand

PostPosted: Tue Aug 16, 2005 1:26 am    Post subject: Reply with quote

i will as soon as everything else is fully set up and working...i like to trust my kernel when i'm installing other stuff. once i know the rest is ok, i'll try 2.6.13-rc.
Back to top
View user's profile Send private message
pinr
Apprentice
Apprentice


Joined: 26 Jan 2003
Posts: 241
Location: Monterrey, Mexico

PostPosted: Mon Oct 31, 2005 3:22 am    Post subject: Reply with quote

I have also have a Gigabyte MB mine being the K8NS since buying this motherboard back in May I have had similiar lockups with all Kernels from 2.6.10 to 2.6.13-rxx My computer never locked during boot only when using x-windows. Sometimes it would stay up for over 100 hours but on other occasions it would lock up 2 or 3 times a day. During these lockups the only option was a hard boot nothing else worked and remote login was not possible. After reading this thread I downgraded to the 2.6.9 kernel and have not had a similar lockup since. However, I'm now experiencing a different type of lock up. This lockup happens in X and it has happened twice both times while I've been away from the machine, that is I notice it when I come back. With this new lockup I can still move the mouse, but nothing else works and I can't switch to console. I can get a remote login and top tells me X is using 99.9% of the cpu. If I killall -9 X then the remote connection freezes and I can do nothing else with the machine. Any suggestions if I could unfreeze the machine from the remote connection I'd be reasonably happy what I really hate is having to hit the power switch
Back to top
View user's profile Send private message
durian
Guru
Guru


Joined: 16 Jul 2003
Posts: 312
Location: Margretetorp

PostPosted: Mon Oct 31, 2005 7:36 am    Post subject: Reply with quote

pinr wrote:
I have also have a Gigabyte MB mine being the K8NS since buying this motherboard back in May I have had similiar lockups with all Kernels from 2.6.10 to 2.6.13-rxx My computer never locked during boot only when using x-windows. Sometimes it would stay up for over 100 hours but on other occasions it would lock up 2 or 3 times a day. During these lockups the only option was a hard boot nothing else worked and remote login was not possible. After reading this thread I downgraded to the 2.6.9 kernel and have not had a similar lockup since. However, I'm now experiencing a different type of lock up. This lockup happens in X and it has happened twice both times while I've been away from the machine, that is I notice it when I come back. With this new lockup I can still move the mouse, but nothing else works and I can't switch to console. I can get a remote login and top tells me X is using 99.9% of the cpu. If I killall -9 X then the remote connection freezes and I can do nothing else with the machine. Any suggestions if I could unfreeze the machine from the remote connection I'd be reasonably happy what I really hate is having to hit the power switch


Just for the record, I have the same mobo, with a Sempron64 2800+ proc. I run kernel 2.6.12-gentoo-r6, never had a lock-up (knocks on wood). I don't run X though...

About the X lockup, maybe you should just kill the X process, not a killall? Is it not even possible to login on the machine with ssh after killing X?

-peter
Back to top
View user's profile Send private message
pinr
Apprentice
Apprentice


Joined: 26 Jan 2003
Posts: 241
Location: Monterrey, Mexico

PostPosted: Mon Oct 31, 2005 1:25 pm    Post subject: Reply with quote

Thanks for the replay durian. Interesting that you've never had a lockup, would you mind sending me a copy of your kernel config file so I can compare I don't think the orignal lock-ups were x-related as they happened with both the nv and nvidia drivers. Also next time I get the lockup I'll try kill X as you suggested. No I can't log on via ssh once I've killed X I get the connection but the console just hangs.
Back to top
View user's profile Send private message
durian
Guru
Guru


Joined: 16 Jul 2003
Posts: 312
Location: Margretetorp

PostPosted: Tue Nov 01, 2005 8:22 am    Post subject: Reply with quote

pinr wrote:
Thanks for the replay durian. Interesting that you've never had a lockup, would you mind sending me a copy of your kernel config file so I can compare I don't think the orignal lock-ups were x-related as they happened with both the nv and nvidia drivers. Also next time I get the lockup I'll try kill X as you suggested. No I can't log on via ssh once I've killed X I get the connection but the console just hangs.


Hi,

I use genkernel, so I guess it is totally standard because I never did the (x)config step on this machine! Lazy :-)
So I'm not sure if that would be useful. But you can of course have it if you want,

-peter
Back to top
View user's profile Send private message
pinr
Apprentice
Apprentice


Joined: 26 Jan 2003
Posts: 241
Location: Monterrey, Mexico

PostPosted: Tue Nov 08, 2005 10:43 pm    Post subject: Reply with quote

Yippee looks like its finally fixed with kernel 2.6.14 current uptime 149:27 and no lockups!
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64 All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum