Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] Gentoo dies for no reason
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
robak
Apprentice
Apprentice


Joined: 14 Jan 2004
Posts: 209
Location: Germany

PostPosted: Sun Aug 01, 2010 7:44 pm    Post subject: [SOLVED] Gentoo dies for no reason Reply with quote

Hi folks.

i'm habing trouble with my gentoo system for some weeks now. the system dies suddenly and randomly. the x-server freezes, rebooting doesnt work (ALT-F1 -> CTRL-ALT-DEL) only Magic SysRq keys work. so it seems the kernel is still running but syslog isn't logging any troubles. This problem appears every 3-4 days.

Here some infos about my system (today state, i'm doing daily updates):
hardware:
Code:

CPU: Core2Quad 9550 E0
MB: Gigabyte GA-G41M-ES2H
RAM: 1x2GB Corsair DDR2-800
GPU: Intel 4500 onboard
Sound: Creative Audigy 4

emerge --info
Code:

Portage 2.1.8.3 (default/linux/amd64/10.0, gcc-4.4.4, glibc-2.11.2-r0, 2.6.34-gentoo-r2 x86_64)
=================================================================
System uname: Linux-2.6.34-gentoo-r2-x86_64-Intel-R-_Core-TM-2_Quad_CPU_Q9550_@_2.83GHz-with-gentoo-2.0.1
Timestamp of tree: Sun, 01 Aug 2010 17:00:19 +0000
distcc 3.1 x86_64-pc-linux-gnu [disabled]
ccache version 2.4 [disabled]
app-shells/bash:     4.1_p7
dev-java/java-config: 2.1.11
dev-lang/python:     2.6.5-r3, 3.1.2-r4
dev-util/ccache:     2.4-r8
dev-util/cmake:      2.8.1-r2
sys-apps/baselayout: 2.0.1
sys-apps/openrc:     0.6.1-r1
sys-apps/sandbox:    2.2
sys-devel/autoconf:  2.13, 2.65-r1
sys-devel/automake:  1.4_p6-r1, 1.8.5-r4, 1.9.6-r3, 1.10.3, 1.11.1
sys-devel/binutils:  2.20.1-r1
sys-devel/gcc:       4.4.4-r1
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.2.10
virtual/os-headers:  2.6.34
ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe -march=core2 -msse3 -mssse3 -msse4.1"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/X11/xkb"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -pipe -march=core2 -msse3 -mssse3 -msse4.1"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--quiet"
FEATURES="assume-digests distlocks fixpackages news parallel-fetch protect-owned sandbox sfperms strict unmerge-logs unmerge-orphans userfetch"
GENTOO_MIRRORS="http://de-mirror.org/distro/gentoo/ http://gentoo.mneisen.org/ http://gentoo.tiscali.nl/"
LANG="de_DE.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LINGUAS="de"
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X acl acpi alsa amd64 apache2 bash-completion berkdb bzip2 cli cracklib crypt cups cxx dbus dri dvb ffmpeg fortran fuse gd gdbm gnome gpm gtk hal iconv jack mmx mmxext modules mudflap multilib mysql ncurses nls nptl nptlonly nsplugin opengl openmp pam pcre perl php policykit pppd python readline reflection ruby session spl sse sse2 sse3 ssl ssse3 sysfs tcpd unicode xorg xvmc zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" DVB_CARDS="usb-dib0700" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="de" QEMU_SOFTMMU_TARGETS="i386 x86_64" QEMU_USER_TARGETS="i386 x86_64" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="intel" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CPPFLAGS, CTARGET, FFLAGS, INSTALL_MASK, LC_ALL, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY


syslog (system died at 16:22:47 Jul 31th)
Code:

Jul 31 15:52:50 localhost su[29145]: Successful su for root by nurbs999
Jul 31 15:52:50 localhost su[29145]: + /dev/pts/9 nurbs999:root
Jul 31 15:52:50 localhost su[29145]: pam_unix(su:session): session opened for user root by nurbs999(uid=1000)
Jul 31 16:09:37 localhost sshd[16903]: Accepted keyboard-interactive/pam for nurbs999 from 95.222.25.59 port 59923 ssh2
Jul 31 16:09:37 localhost sshd[16903]: pam_unix(sshd:session): session opened for user nurbs999 by (uid=0)
Jul 31 16:12:39 localhost dbus-daemon: [system] Reloaded configuration
Jul 31 16:22:47 localhost kernel: [158121.739016] x86_64-pc-linux used greatest stack depth: 3096 bytes left
Aug  1 17:51:38 localhost syslog-ng[2259]: syslog-ng starting up; version='3.1.1'
Aug  1 17:51:38 localhost kernel: [    0.000000] Initializing cgroup subsys cpuset
Aug  1 17:51:38 localhost kernel: [    0.000000] Linux version 2.6.34-gentoo-r2 (root@localhost) (gcc version 4.4.4 (Gentoo 4.4.4-r1 p1.0, pie-0.4.5) ) #1 SMP Sun Jul 25 22:21:28 CEST 2010
Aug  1 17:51:38 localhost kernel: [    0.000000] Command line: root=/dev/sde2 rw gentoo=nodevfs rootdelay=6 quiet
Aug  1 17:51:38 localhost kernel: [    0.000000] BIOS-provided physical RAM map:



i was emerging some software updates via 'emerge -DNu world' when the system died, but there are no cpu-temperature problems. max core temp is about 44°C logged through '/sys/devices/platform/coretemp.[0-3]/temp1_input'
but sometimes the system dies even while idling.

I appreciate any tips to find the problem.

thanks in advance
robak
_________________
run this in your gentoo-bash:
"grep -R -i -A2 -B2 'on fire' /usr/src/linux/*"


Last edited by robak on Wed Aug 11, 2010 6:16 pm; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54421
Location: 56N 3W

PostPosted: Sun Aug 01, 2010 8:06 pm    Post subject: Reply with quote

robak,

I suspect memory or motherboard problems.

Try a few cycles of memtest. If it finds something, it does not always mean RAM is faulty.

It could also be power related. Does the system dying correlate with power surges or do you have a proper UPS?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
robak
Apprentice
Apprentice


Joined: 14 Jan 2004
Posts: 209
Location: Germany

PostPosted: Sun Aug 01, 2010 8:21 pm    Post subject: Reply with quote

memtest ran for about 12hrs without any error. i have no UPS but a 350W BeQuiet PSU. i took some power measurements and the system takes about 90W at full load, about 50W at idle (that was some days ago, sry i missed to mention it)
by idle i mean all hdds (except system-hdd) are in stand-by and by full-load all cpu-cores are under heavy load plus all hdds are running.

i also tried another mainboard, same problems.

my last try today is to unplug any USB HDDs. i will see in a few days if that helps.
and oh, all HDDs are encrypted with truecrypt except the system hdd.

ok, just to be as clear as possible, here's a list of all hdds:

System: external USB drive, power via USB (changed this today, internal now)
Data: four internal hdds
one external with own PSU (changed this today, internal now)
_________________
run this in your gentoo-bash:
"grep -R -i -A2 -B2 'on fire' /usr/src/linux/*"
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54421
Location: 56N 3W

PostPosted: Sun Aug 01, 2010 9:49 pm    Post subject: Reply with quote

robak,

At first sight, a 350w PSU seems plenty but its not the headline total power that matters, its the limit per voltage, per output and per combinations if any.
You normally hit the 3.3v/5v limit before you get anywhere near the headline power.

Also many PSUs have several independent 12v supplies. At 350w, you will not have more than two. Each one must not be overloaded.
All the gory detail will be on the label on the PSU. The only way to check for the PSU running at or close to its limit is to look at each DC supply separately and in combination as the label on the PSU says.

By external power surges, I had in mind 'brownouts'. This is when the supply to the PC drops out of voltage specification but does cut out altogether.
Such events can cause the DC power supplies inside the PC to go out of spec and the system to behave in odd ways without the PWR OK signal going false and triggering a reset.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
robak
Apprentice
Apprentice


Joined: 14 Jan 2004
Posts: 209
Location: Germany

PostPosted: Sun Aug 01, 2010 10:19 pm    Post subject: Reply with quote

thanks for your explanation, NeddySeagoon.
here are the datas for my PSU (its a 300W PSU, my mistake):
Output:
3.3V -> 23A
5V -> 23A
max combined power: 150W

12V1 -> 18A
12V2 -> 18A
max combined current: 21A
max combined power: 252W

-12V -> 0.5A (max power: 6W)
5VSB -> 2A (max power: 10W)

each HDD is label with this power consumption:
5V -> 0.72A
12V -> 0.52A

if you are right, i'm wondering why the system crashes after several days.
_________________
run this in your gentoo-bash:
"grep -R -i -A2 -B2 'on fire' /usr/src/linux/*"


Last edited by robak on Mon Aug 02, 2010 8:44 pm; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54421
Location: 56N 3W

PostPosted: Mon Aug 02, 2010 6:43 pm    Post subject: Reply with quote

robak,

The actual PSU current limits will change with load and temperature.

Now we need to know how your system is wired to calculate the loads. We also need to know a little about your motherbaord, RAM and how the CPU is powered.
If you are using the extra ATX +12v connector, that will provide power to the CPU core and nothing else. The drive spin motors will be operated from the other 12v supply. The motors will have a spin up surge at least twice the 0.52A advertised.

The -12v and 5vSTBY can be ignored. The -12v is used only by serial ports and the 5vSTBY for switching the system on/off and possibly powering a USB root hub, so the mouse or keyboard can bring the system out of standby.

When you add up all the max combined power, you get well past 300w ... I make it 408w, so there is a further implicit restriction, that the 3.3v, +5v, +12v power must not exceed 300w. (Ignoring the -12v and 5v STBY.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
robak
Apprentice
Apprentice


Joined: 14 Jan 2004
Posts: 209
Location: Germany

PostPosted: Mon Aug 02, 2010 8:42 pm    Post subject: Reply with quote

yes, i'm using the extra +12V ATX connector for the cpu. two harddrives are connected to one +12V supply the other three to the other +12V supply.
the cpu is a Core2Quad 9550 E0-stepping, as i mentioned in my first post. it's running at fixed 1,1375V core-voltage.

(just to be clear, i tested the cpu at 1,100V with mersenne prime95 test for 48hrs with two setups; first, one instance with four threads for 24hrs (8k-6M FFT size) and second, four instances with one thread each with the same settings for another 24hrs. so i think the cpu is not the problem for my crashes.)

i already posted the rest of my hardware setup in my first post. everything else is at stock voltage.
_________________
run this in your gentoo-bash:
"grep -R -i -A2 -B2 'on fire' /usr/src/linux/*"
Back to top
View user's profile Send private message
mattmatteh
Guru
Guru


Joined: 10 Mar 2004
Posts: 449
Location: near chicago

PostPosted: Mon Aug 02, 2010 9:04 pm    Post subject: Reply with quote

2 things to try 1) setting the max cpu freq to the lowest and see long you can go. 2) go into the bios and underclock the ram. i have had random freezes and took me a month to find out it was bad ram. memtest kinda found problems but nothing consistant.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54421
Location: 56N 3W

PostPosted: Mon Aug 02, 2010 9:24 pm    Post subject: Reply with quote

robak,

Your CPU draws 100A at 1.1375v, thats 114W.
All that comes out of the extra 12v supply provided by the additional 4 pin ATX connector.
There is a DC-DC converter right next to the CPU to convert the 12v to the 1.1375v the CPU needs. That converter is not 100% effecient, so lets say the power out of the PSU is 120w, as thats 10A on the +12v supply for the CPU.

We will count your hard drives at 1A each on the 12v, to allow for spin up, thats another 5A ...or 15A total on the +12v, or 180W
That leaves 6A more for other things. It leaves 120W for everything else in your system.

Lets count the 5v for the hard drives at 1A each too ... thats 5A or 25W. Now we are up to 205W

The next power hungry device is the GPU, and after that any plug in cards.
The Intel 4500 chipset needs 24W ... thats 229W and we have yet to take account of your RAM and any plug in cards, CDROM or floppy.

The 229W is a worst case figure (so far) but it does not provide the whole picture. Thats static load.
The load in a PC is anything but static. For example, the CPU can go from next to nothing to full load in a few clock cycles. The PSU has to cope with this dynamic load too or you get all sorts of odd behavior.

The only way to test for dynamic load is with specialist test equipment or by substitution. Do you have a spare PSU you can swap with to test?
PSU dynamic load capability tends to reduce with age too.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
robak
Apprentice
Apprentice


Joined: 14 Jan 2004
Posts: 209
Location: Germany

PostPosted: Mon Aug 02, 2010 10:23 pm    Post subject: Reply with quote

@mattmatteh: 1st) i always do this before i test cpus at highest multiplicator to check if i get crashes when the cpu is in idle mode. 2nd) already done this; checkes with memtest for 12hrs, no errors but system still crashed randomly, so i set ram back to stock frequency.

@NeddySeagoon: 120W for cpu, well, ok for the VERY worst case since intel specs are talking about 100A max for my cpu, but since the TDP is 95W which is never reached i would calc 80W as worst case for the cpu. but ok, lets take your 120W, just to be sure.
but the 180W (cpu+harddrives) are splitted on three 12V lines or am i taking something wrong?

this 300W PSU is about one year old but i have another 500W PSU in the age of 3 years. if the system crashes again, i'll test with that one.

i have neither an optical drive nor a floppy.

the system is up for 1d5h now since i unplugged the usb-drives and build them into the case. *fingers crossed*

thanks for your great help so far.
_________________
run this in your gentoo-bash:
"grep -R -i -A2 -B2 'on fire' /usr/src/linux/*"
Back to top
View user's profile Send private message
dmpogo
Advocate
Advocate


Joined: 02 Sep 2004
Posts: 3269
Location: Canada

PostPosted: Tue Aug 03, 2010 12:20 am    Post subject: Reply with quote

robak wrote:

each HDD is label with this power consumption:
5V -> 0.72A
12V -> 0.52A


BTW, modern mechanical hard drives take around 2A on spinup, so you could budget around 25W per harddrive on 12V.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54421
Location: 56N 3W

PostPosted: Tue Aug 03, 2010 5:13 pm    Post subject: Reply with quote

robak,

You PSU label says
Code:
12V1 -> 18A
12V2 -> 18A
max combined current: 21A
max combined power: 252W
So you only have two separate +12v supplies from the PSU.
One will be the +12v ATX connector, which will be the CPU only, the other +12v will be all the harddrives, the motherboard (except the CPU) and your plug in cards.

You may have several physical wires emerging from the PSU for your HDDs but they are not connected to separate power converters inside the PSU.
The several wires reduces voltage drop down the cables, which is a good thing
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
robak
Apprentice
Apprentice


Joined: 14 Jan 2004
Posts: 209
Location: Germany

PostPosted: Wed Aug 11, 2010 6:16 pm    Post subject: Reply with quote

since I unplugged the USB HDDs and connected them internally I had no crash. so I think it was an power problem at the USB hub.
nevertheless, thanks for your help !
_________________
run this in your gentoo-bash:
"grep -R -i -A2 -B2 'on fire' /usr/src/linux/*"
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum