View previous topic :: View next topic |
Author |
Message |
passive Tux's lil' helper
Joined: 31 Dec 2004 Posts: 105
|
Posted: Tue Oct 30, 2007 7:57 pm Post subject: GCC intermittantly failing |
|
|
Hi All,
I have a system that's had a variety of errors over the past few months. Very inconsistent, generally, but there has always seemed to be something a little "off".
I just did a fresh install of Gentoo, and while updating the core packages, I kept getting errors. I was unable to rebuild GCC or glibc, for example, but there were also failures in a variety of other packages. Trying to install the most basic KDE, which is about 100 packages, including the ones for X, resulted in about a dozen of them failing.
Some of the errors specifically mentioned GCC failing, and cautioned that it was probably a hardware or OS problem. This would make sense given the history I've seen, but I'm not exactly sure how to track it down. I don't think it's a hard drive problem, as of the two in there, one is brand new, and the other has been fairly thoroughly tested. I suspected memory, and am currently running memtest+, but it's got 18 passes with 0 errors so far.
Are there any tools you could recommend to diagnose what might be failing? Thanks. |
|
Back to top |
|
|
eyoung100 Veteran
Joined: 23 Jan 2004 Posts: 1428
|
Posted: Tue Oct 30, 2007 8:56 pm Post subject: |
|
|
Try using an emerge wrapper first to correctly rebuild your toolchain. I use emwrap. For more information see this post. If there is a bunch of cruft(file trash), or you've had your system up awhile, the toolchain sometimes can become unstable. If emwrap fails to rebuild the toolchain, then you can safely rule out gcc as being the cause. If this doesn't work, please reply with the manufacturer of your hard drives. _________________ The Birth and Growth of Science is the Death and Atrophy of Art -- Unknown
Registerd Linux User #363735
Adopt a Post | Strip Comments| Emerge Wrapper |
|
Back to top |
|
|
tarpman Veteran
Joined: 04 Nov 2004 Posts: 1083 Location: Victoria, BC, Canada
|
Posted: Tue Oct 30, 2007 9:01 pm Post subject: |
|
|
Overheating is a major possibility. _________________ Saving the world, one kilobyte at a time. |
|
Back to top |
|
|
eyoung100 Veteran
Joined: 23 Jan 2004 Posts: 1428
|
Posted: Wed Oct 31, 2007 1:51 pm Post subject: |
|
|
tarpman wrote: | Overheating is a major possibility. |
With that in mind, check your BIOS's PC Health Section. Somehere in that section will be a CPU Fan Speed indicator along with a CPU Temperature and a System Temperature. If your CPU Temp is >= 125 F, you may have a cracked CPU. If the System Temp is >= 110 F, you may have a blocked cooling fan or not enough fans. Heat does do major damage, and it isn't slow damage, it can be rather quick.
Let us all know what you find out _________________ The Birth and Growth of Science is the Death and Atrophy of Art -- Unknown
Registerd Linux User #363735
Adopt a Post | Strip Comments| Emerge Wrapper |
|
Back to top |
|
|
passive Tux's lil' helper
Joined: 31 Dec 2004 Posts: 105
|
Posted: Wed Oct 31, 2007 4:03 pm Post subject: Update |
|
|
Ok, well here's where I'm at:
Memtest+ went to 30 runs with 0 errors. Given how frequently I get errors when it's running regularly, I figured it's probably not the memory.
I tried:
emwrap.sh -t world
Which started with glibc, and segfaulted. I rebooted twice, an ran
emwrap.sh -r -t world
each time, with a segfault as my reward.
Now I'm looking into diagnosing the CPU. It's an Athlon 64 at 2.2 Ghz, and I have a very new fan on it. I'm currently reading this guide:
http://www.gentoo.org/doc/en/articles/hardware-stability-p1.xml |
|
Back to top |
|
|
redgsturbo Apprentice
Joined: 24 Jun 2005 Posts: 283
|
Posted: Wed Oct 31, 2007 4:46 pm Post subject: Re: Update |
|
|
passive wrote: | Ok, well here's where I'm at:
Memtest+ went to 30 runs with 0 errors. Given how frequently I get errors when it's running regularly, I figured it's probably not the memory.
I tried:
emwrap.sh -t world
Which started with glibc, and segfaulted. I rebooted twice, an ran
emwrap.sh -r -t world
each time, with a segfault as my reward.
Now I'm looking into diagnosing the CPU. It's an Athlon 64 at 2.2 Ghz, and I have a very new fan on it. I'm currently reading this guide:
http://www.gentoo.org/doc/en/articles/hardware-stability-p1.xml |
is the machine overclocked? Did you use a decent thermal paste? |
|
Back to top |
|
|
eyoung100 Veteran
Joined: 23 Jan 2004 Posts: 1428
|
Posted: Wed Oct 31, 2007 5:08 pm Post subject: |
|
|
eyoung100 wrote: | tarpman wrote: | Overheating is a major possibility. |
With that in mind, check your BIOS's PC Health Section. Somehere in that section will be a CPU Fan Speed indicator along with a CPU Temperature and a System Temperature. If your CPU Temp is >= 125 F, you may have a cracked CPU. If the System Temp is >= 110 F, you may have a blocked cooling fan or not enough fans. Heat does do major damage, and it isn't slow damage, it can be rather quick.
Let us all know what you find out |
This is much like the lm_sensors package described in the doc you are reading. Check here before you put the system under load. If the temperature is high when the system is "idling," you have an undercooling issue. That's why we're all asking questions that normally create heat stress on a CPU, i.e. overclocking, cracking etc. _________________ The Birth and Growth of Science is the Death and Atrophy of Art -- Unknown
Registerd Linux User #363735
Adopt a Post | Strip Comments| Emerge Wrapper |
|
Back to top |
|
|
passive Tux's lil' helper
Joined: 31 Dec 2004 Posts: 105
|
Posted: Wed Oct 31, 2007 6:30 pm Post subject: |
|
|
I got lm_sensors working, or so it seems. Running CPUburn takes the k8temp sensor from 48 to 76 degrees in just a few minutes. I'm guessing this is the problem.
I don't think the CPU is overclocked, but I got the system as a whole from my brother, so it is a possibility. I'm not entirely sure of the thermal paste used, so I will double check that as well.
Thanks for all the help so far. |
|
Back to top |
|
|
eyoung100 Veteran
Joined: 23 Jan 2004 Posts: 1428
|
|
Back to top |
|
|
passive Tux's lil' helper
Joined: 31 Dec 2004 Posts: 105
|
Posted: Wed Oct 31, 2007 7:29 pm Post subject: |
|
|
Ok, the CPU was overclocked. It's a 2800+ which is supposed to be at 1.8Ghz. Unfortunately, the only keyboard I have is a on imac USB model, without a delete key, so thus far I am locked out of the BIOS. Fortunately, I've reset enough BIOS' in my time to be able to guess the correct jumper, and I'm now back operating at 1.8Ghz.
I also cleaned off and reapplied the thermal paste, which I believe is Artic Silver Ceramic.
At this point, it idles around 44 degrees. I have a thermaltake Silent Boost fan, which probably contributes to this relatively high idle temp. Running CPUburn, it's still getting remarkably hot (I've seen 82 degrees so far, though it's taken far longer then before to get to that point), so I wonder if my fan isn't working properly. I did a bit of research before purchasing this model, and it should have no problem cooling this CPU. The heatsink itself gets very hot, so I think it is absorbing heat very well.
Oh, here's my make.conf (I've barely done anything with this system yet):
Code: |
# These settings were set by the catalyst build script that automatically
# built this stage.
# Please consult /etc/make.conf.example for a more detailed example.
CFLAGS="-O2 -march=i686 -pipe"
CXXFLAGS="${CFLAGS}"
# This should not be changed unless you know exactly what you are doing. You
# should probably be using a different stage, instead.
ACCEPT_KEYWORDS="x86"
CHOST="i686-pc-linux-gnu"
USE="-ipv6"
|
At this point, I'm thinking either the fan on the heatsink isn't doing it's job, or the CPU has started producing too much heat. |
|
Back to top |
|
|
redgsturbo Apprentice
Joined: 24 Jun 2005 Posts: 283
|
Posted: Wed Oct 31, 2007 7:43 pm Post subject: |
|
|
passive wrote: | Ok, the CPU was overclocked. It's a 2800+ which is supposed to be at 1.8Ghz. Unfortunately, the only keyboard I have is a on imac USB model, without a delete key, so thus far I am locked out of the BIOS. Fortunately, I've reset enough BIOS' in my time to be able to guess the correct jumper, and I'm now back operating at 1.8Ghz.
I also cleaned off and reapplied the thermal paste, which I believe is Artic Silver Ceramic.
At this point, it idles around 44 degrees. I have a thermaltake Silent Boost fan, which probably contributes to this relatively high idle temp. Running CPUburn, it's still getting remarkably hot (I've seen 82 degrees so far, though it's taken far longer then before to get to that point), so I wonder if my fan isn't working properly. I did a bit of research before purchasing this model, and it should have no problem cooling this CPU. The heatsink itself gets very hot, so I think it is absorbing heat very well.
Oh, here's my make.conf (I've barely done anything with this system yet):
Code: |
# These settings were set by the catalyst build script that automatically
# built this stage.
# Please consult /etc/make.conf.example for a more detailed example.
CFLAGS="-O2 -march=i686 -pipe"
CXXFLAGS="${CFLAGS}"
# This should not be changed unless you know exactly what you are doing. You
# should probably be using a different stage, instead.
ACCEPT_KEYWORDS="x86"
CHOST="i686-pc-linux-gnu"
USE="-ipv6"
|
At this point, I'm thinking either the fan on the heatsink isn't doing it's job, or the CPU has started producing too much heat. |
I have a pentium D that runs perfect at 2.66, and runs stable at 3.4Ghz except for long long compiles such as gcc, glibc, or a looped kernel recomple. It had the same behaviour that you are describing (gcc complianing about likely hardware issues, difficult to reproduce in the same part of a compile, etc) |
|
Back to top |
|
|
eyoung100 Veteran
Joined: 23 Jan 2004 Posts: 1428
|
|
Back to top |
|
|
passive Tux's lil' helper
Joined: 31 Dec 2004 Posts: 105
|
Posted: Wed Oct 31, 2007 8:49 pm Post subject: |
|
|
Oh, all those temperatures are Celsius.
At the moment, I think everything is ok. I'm compiling GCC, and the temperature is stable around 57 degrees. I will change the --march, and give that a try.
Thanks for all your help. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54809 Location: 56N 3W
|
Posted: Wed Oct 31, 2007 9:16 pm Post subject: |
|
|
passive,
80C is a top limit - hotter and you can expect problems but its not a hard limit.
The hot heatsink indicates that the heatsink is in good thermal contact with the CPU but that the heat is not being conducted to the air, or the hot air in the case is unable to get out. The PSU fan and the rear case fan should both be moving hot air out of the case, so air flows in at the front bottom, diagonally across the motherboard, over the CPU and out at the rear. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
tarpman Veteran
Joined: 04 Nov 2004 Posts: 1083 Location: Victoria, BC, Canada
|
Posted: Wed Oct 31, 2007 9:23 pm Post subject: |
|
|
To take NeddySeagoon's idea one step further, check that vents at the front and back are clear of dust (and the CPU fan while you're at it). _________________ Saving the world, one kilobyte at a time. |
|
Back to top |
|
|
eyoung100 Veteran
Joined: 23 Jan 2004 Posts: 1428
|
Posted: Wed Oct 31, 2007 9:45 pm Post subject: |
|
|
Congrats on getting the compile started. Sorry for the confusion with the temperatures. I have a feeling the US will never be metric, oh well. Now that the temperature stays constant while working you're where you need to be. Each system that you build will have its own unique characteristics, one of which is temperature. The reason you could not trust your gut on what the temperature should be is because you received this from your brother. If the temperature stays constant at around 57C(135 F), it should never rise or fall more than 5 to 10 degrees in either direction. _________________ The Birth and Growth of Science is the Death and Atrophy of Art -- Unknown
Registerd Linux User #363735
Adopt a Post | Strip Comments| Emerge Wrapper |
|
Back to top |
|
|
passive Tux's lil' helper
Joined: 31 Dec 2004 Posts: 105
|
Posted: Wed Oct 31, 2007 11:15 pm Post subject: |
|
|
Ok, that worked out pretty well. I think the CPU has simply decided not to run at that speed anymore. It's fine with me, it's a server box.
At this point, I've had many packages fail compiling, is there an easy way to recompile everything?
Thanks again. |
|
Back to top |
|
|
eyoung100 Veteran
Joined: 23 Jan 2004 Posts: 1428
|
Posted: Thu Nov 01, 2007 3:04 pm Post subject: |
|
|
type
Code: |
emerge profuse
profuse
|
Set all the USE flags you like, and then:
Code: |
emerge --emptytree --newuse world
|
This should emerge anywhere from 300-450 packages. If it fails, use:
or if you prefer the emerge wrapper I showed you I believe its:
Code: |
emerge profuse
profuse
|
Set all the USE flags you like, and then:
This wrapper is the best tool, as it rebuilds your toolchain each time GCC or GLIBC has an update. This method is longest, because GCC is built twice. The First time your old GCC builds the new GCC and the second time the new GCC builds GLIBC, GCC and all its associated packages again, which ensures your system compiler is always up to date. The N or --newuse ensures that all the flags you set are used. _________________ The Birth and Growth of Science is the Death and Atrophy of Art -- Unknown
Registerd Linux User #363735
Adopt a Post | Strip Comments| Emerge Wrapper |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|