Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
GCC intermittantly failing
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
passive
Tux's lil' helper
Tux's lil' helper


Joined: 31 Dec 2004
Posts: 105

PostPosted: Tue Oct 30, 2007 7:57 pm    Post subject: GCC intermittantly failing Reply with quote

Hi All,

I have a system that's had a variety of errors over the past few months. Very inconsistent, generally, but there has always seemed to be something a little "off".
I just did a fresh install of Gentoo, and while updating the core packages, I kept getting errors. I was unable to rebuild GCC or glibc, for example, but there were also failures in a variety of other packages. Trying to install the most basic KDE, which is about 100 packages, including the ones for X, resulted in about a dozen of them failing.

Some of the errors specifically mentioned GCC failing, and cautioned that it was probably a hardware or OS problem. This would make sense given the history I've seen, but I'm not exactly sure how to track it down. I don't think it's a hard drive problem, as of the two in there, one is brand new, and the other has been fairly thoroughly tested. I suspected memory, and am currently running memtest+, but it's got 18 passes with 0 errors so far.

Are there any tools you could recommend to diagnose what might be failing? Thanks.
Back to top
View user's profile Send private message
eyoung100
Veteran
Veteran


Joined: 23 Jan 2004
Posts: 1428

PostPosted: Tue Oct 30, 2007 8:56 pm    Post subject: Reply with quote

Try using an emerge wrapper first to correctly rebuild your toolchain. I use emwrap. For more information see this post. If there is a bunch of cruft(file trash), or you've had your system up awhile, the toolchain sometimes can become unstable. If emwrap fails to rebuild the toolchain, then you can safely rule out gcc as being the cause. If this doesn't work, please reply with the manufacturer of your hard drives.
_________________
The Birth and Growth of Science is the Death and Atrophy of Art -- Unknown
Registerd Linux User #363735
Adopt a Post | Strip Comments| Emerge Wrapper
Back to top
View user's profile Send private message
tarpman
Veteran
Veteran


Joined: 04 Nov 2004
Posts: 1083
Location: Victoria, BC, Canada

PostPosted: Tue Oct 30, 2007 9:01 pm    Post subject: Reply with quote

Overheating is a major possibility.
_________________
Saving the world, one kilobyte at a time.
Back to top
View user's profile Send private message
eyoung100
Veteran
Veteran


Joined: 23 Jan 2004
Posts: 1428

PostPosted: Wed Oct 31, 2007 1:51 pm    Post subject: Reply with quote

tarpman wrote:
Overheating is a major possibility.


With that in mind, check your BIOS's PC Health Section. Somehere in that section will be a CPU Fan Speed indicator along with a CPU Temperature and a System Temperature. If your CPU Temp is >= 125 F, you may have a cracked CPU. If the System Temp is >= 110 F, you may have a blocked cooling fan or not enough fans. Heat does do major damage, and it isn't slow damage, it can be rather quick.

Let us all know what you find out :? :?:
_________________
The Birth and Growth of Science is the Death and Atrophy of Art -- Unknown
Registerd Linux User #363735
Adopt a Post | Strip Comments| Emerge Wrapper
Back to top
View user's profile Send private message
passive
Tux's lil' helper
Tux's lil' helper


Joined: 31 Dec 2004
Posts: 105

PostPosted: Wed Oct 31, 2007 4:03 pm    Post subject: Update Reply with quote

Ok, well here's where I'm at:

Memtest+ went to 30 runs with 0 errors. Given how frequently I get errors when it's running regularly, I figured it's probably not the memory.

I tried:

emwrap.sh -t world

Which started with glibc, and segfaulted. I rebooted twice, an ran

emwrap.sh -r -t world

each time, with a segfault as my reward.

Now I'm looking into diagnosing the CPU. It's an Athlon 64 at 2.2 Ghz, and I have a very new fan on it. I'm currently reading this guide:

http://www.gentoo.org/doc/en/articles/hardware-stability-p1.xml
Back to top
View user's profile Send private message
redgsturbo
Apprentice
Apprentice


Joined: 24 Jun 2005
Posts: 283

PostPosted: Wed Oct 31, 2007 4:46 pm    Post subject: Re: Update Reply with quote

passive wrote:
Ok, well here's where I'm at:

Memtest+ went to 30 runs with 0 errors. Given how frequently I get errors when it's running regularly, I figured it's probably not the memory.

I tried:

emwrap.sh -t world

Which started with glibc, and segfaulted. I rebooted twice, an ran

emwrap.sh -r -t world

each time, with a segfault as my reward.

Now I'm looking into diagnosing the CPU. It's an Athlon 64 at 2.2 Ghz, and I have a very new fan on it. I'm currently reading this guide:

http://www.gentoo.org/doc/en/articles/hardware-stability-p1.xml


is the machine overclocked? Did you use a decent thermal paste?
Back to top
View user's profile Send private message
eyoung100
Veteran
Veteran


Joined: 23 Jan 2004
Posts: 1428

PostPosted: Wed Oct 31, 2007 5:08 pm    Post subject: Reply with quote

eyoung100 wrote:
tarpman wrote:
Overheating is a major possibility.


With that in mind, check your BIOS's PC Health Section. Somehere in that section will be a CPU Fan Speed indicator along with a CPU Temperature and a System Temperature. If your CPU Temp is >= 125 F, you may have a cracked CPU. If the System Temp is >= 110 F, you may have a blocked cooling fan or not enough fans. Heat does do major damage, and it isn't slow damage, it can be rather quick.

Let us all know what you find out :? :?:


This is much like the lm_sensors package described in the doc you are reading. Check here before you put the system under load. If the temperature is high when the system is "idling," you have an undercooling issue. That's why we're all asking questions that normally create heat stress on a CPU, i.e. overclocking, cracking etc.
_________________
The Birth and Growth of Science is the Death and Atrophy of Art -- Unknown
Registerd Linux User #363735
Adopt a Post | Strip Comments| Emerge Wrapper
Back to top
View user's profile Send private message
passive
Tux's lil' helper
Tux's lil' helper


Joined: 31 Dec 2004
Posts: 105

PostPosted: Wed Oct 31, 2007 6:30 pm    Post subject: Reply with quote

I got lm_sensors working, or so it seems. Running CPUburn takes the k8temp sensor from 48 to 76 degrees in just a few minutes. I'm guessing this is the problem.

I don't think the CPU is overclocked, but I got the system as a whole from my brother, so it is a possibility. I'm not entirely sure of the thermal paste used, so I will double check that as well.

Thanks for all the help so far.
Back to top
View user's profile Send private message
eyoung100
Veteran
Veteran


Joined: 23 Jan 2004
Posts: 1428

PostPosted: Wed Oct 31, 2007 6:59 pm    Post subject: Reply with quote

Would you please paste your /etc/make.conf?? Let's make sure your compile settings are not overdone also.
_________________
The Birth and Growth of Science is the Death and Atrophy of Art -- Unknown
Registerd Linux User #363735
Adopt a Post | Strip Comments| Emerge Wrapper
Back to top
View user's profile Send private message
passive
Tux's lil' helper
Tux's lil' helper


Joined: 31 Dec 2004
Posts: 105

PostPosted: Wed Oct 31, 2007 7:29 pm    Post subject: Reply with quote

Ok, the CPU was overclocked. It's a 2800+ which is supposed to be at 1.8Ghz. Unfortunately, the only keyboard I have is a on imac USB model, without a delete key, so thus far I am locked out of the BIOS. Fortunately, I've reset enough BIOS' in my time to be able to guess the correct jumper, and I'm now back operating at 1.8Ghz.

I also cleaned off and reapplied the thermal paste, which I believe is Artic Silver Ceramic.

At this point, it idles around 44 degrees. I have a thermaltake Silent Boost fan, which probably contributes to this relatively high idle temp. Running CPUburn, it's still getting remarkably hot (I've seen 82 degrees so far, though it's taken far longer then before to get to that point), so I wonder if my fan isn't working properly. I did a bit of research before purchasing this model, and it should have no problem cooling this CPU. The heatsink itself gets very hot, so I think it is absorbing heat very well.

Oh, here's my make.conf (I've barely done anything with this system yet):

Code:

# These settings were set by the catalyst build script that automatically
# built this stage.
# Please consult /etc/make.conf.example for a more detailed example.
CFLAGS="-O2 -march=i686 -pipe"
CXXFLAGS="${CFLAGS}"
# This should not be changed unless you know exactly what you are doing.  You
# should probably be using a different stage, instead.
ACCEPT_KEYWORDS="x86"
CHOST="i686-pc-linux-gnu"
USE="-ipv6"


At this point, I'm thinking either the fan on the heatsink isn't doing it's job, or the CPU has started producing too much heat.
Back to top
View user's profile Send private message
redgsturbo
Apprentice
Apprentice


Joined: 24 Jun 2005
Posts: 283

PostPosted: Wed Oct 31, 2007 7:43 pm    Post subject: Reply with quote

passive wrote:
Ok, the CPU was overclocked. It's a 2800+ which is supposed to be at 1.8Ghz. Unfortunately, the only keyboard I have is a on imac USB model, without a delete key, so thus far I am locked out of the BIOS. Fortunately, I've reset enough BIOS' in my time to be able to guess the correct jumper, and I'm now back operating at 1.8Ghz.

I also cleaned off and reapplied the thermal paste, which I believe is Artic Silver Ceramic.

At this point, it idles around 44 degrees. I have a thermaltake Silent Boost fan, which probably contributes to this relatively high idle temp. Running CPUburn, it's still getting remarkably hot (I've seen 82 degrees so far, though it's taken far longer then before to get to that point), so I wonder if my fan isn't working properly. I did a bit of research before purchasing this model, and it should have no problem cooling this CPU. The heatsink itself gets very hot, so I think it is absorbing heat very well.

Oh, here's my make.conf (I've barely done anything with this system yet):

Code:

# These settings were set by the catalyst build script that automatically
# built this stage.
# Please consult /etc/make.conf.example for a more detailed example.
CFLAGS="-O2 -march=i686 -pipe"
CXXFLAGS="${CFLAGS}"
# This should not be changed unless you know exactly what you are doing.  You
# should probably be using a different stage, instead.
ACCEPT_KEYWORDS="x86"
CHOST="i686-pc-linux-gnu"
USE="-ipv6"


At this point, I'm thinking either the fan on the heatsink isn't doing it's job, or the CPU has started producing too much heat.


I have a pentium D that runs perfect at 2.66, and runs stable at 3.4Ghz except for long long compiles such as gcc, glibc, or a looped kernel recomple. It had the same behaviour that you are describing (gcc complianing about likely hardware issues, difficult to reproduce in the same part of a compile, etc)
Back to top
View user's profile Send private message
eyoung100
Veteran
Veteran


Joined: 23 Jan 2004
Posts: 1428

PostPosted: Wed Oct 31, 2007 8:38 pm    Post subject: Reply with quote

Below 100 F is perfectly acceptable. Try:
Change -march to k8 then:
Code:

emerge --sync

followed by:
Code:

./emwrap.sh -wuDb


Let me know if it crashes...
_________________
The Birth and Growth of Science is the Death and Atrophy of Art -- Unknown
Registerd Linux User #363735
Adopt a Post | Strip Comments| Emerge Wrapper
Back to top
View user's profile Send private message
passive
Tux's lil' helper
Tux's lil' helper


Joined: 31 Dec 2004
Posts: 105

PostPosted: Wed Oct 31, 2007 8:49 pm    Post subject: Reply with quote

Oh, all those temperatures are Celsius.

At the moment, I think everything is ok. I'm compiling GCC, and the temperature is stable around 57 degrees. I will change the --march, and give that a try.

Thanks for all your help.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54809
Location: 56N 3W

PostPosted: Wed Oct 31, 2007 9:16 pm    Post subject: Reply with quote

passive,

80C is a top limit - hotter and you can expect problems but its not a hard limit.

The hot heatsink indicates that the heatsink is in good thermal contact with the CPU but that the heat is not being conducted to the air, or the hot air in the case is unable to get out. The PSU fan and the rear case fan should both be moving hot air out of the case, so air flows in at the front bottom, diagonally across the motherboard, over the CPU and out at the rear.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
tarpman
Veteran
Veteran


Joined: 04 Nov 2004
Posts: 1083
Location: Victoria, BC, Canada

PostPosted: Wed Oct 31, 2007 9:23 pm    Post subject: Reply with quote

To take NeddySeagoon's idea one step further, check that vents at the front and back are clear of dust (and the CPU fan while you're at it).
_________________
Saving the world, one kilobyte at a time.
Back to top
View user's profile Send private message
eyoung100
Veteran
Veteran


Joined: 23 Jan 2004
Posts: 1428

PostPosted: Wed Oct 31, 2007 9:45 pm    Post subject: Reply with quote

Congrats on getting the compile started. Sorry for the confusion with the temperatures. I have a feeling the US will never be metric, oh well. Now that the temperature stays constant while working you're where you need to be. Each system that you build will have its own unique characteristics, one of which is temperature. The reason you could not trust your gut on what the temperature should be is because you received this from your brother. If the temperature stays constant at around 57C(135 F), it should never rise or fall more than 5 to 10 degrees in either direction.
_________________
The Birth and Growth of Science is the Death and Atrophy of Art -- Unknown
Registerd Linux User #363735
Adopt a Post | Strip Comments| Emerge Wrapper
Back to top
View user's profile Send private message
passive
Tux's lil' helper
Tux's lil' helper


Joined: 31 Dec 2004
Posts: 105

PostPosted: Wed Oct 31, 2007 11:15 pm    Post subject: Reply with quote

Ok, that worked out pretty well. I think the CPU has simply decided not to run at that speed anymore. It's fine with me, it's a server box.

At this point, I've had many packages fail compiling, is there an easy way to recompile everything?

Thanks again.
Back to top
View user's profile Send private message
eyoung100
Veteran
Veteran


Joined: 23 Jan 2004
Posts: 1428

PostPosted: Thu Nov 01, 2007 3:04 pm    Post subject: Reply with quote

type
Code:

emerge profuse
profuse

Set all the USE flags you like, and then:
Code:

emerge --emptytree --newuse world


This should emerge anywhere from 300-450 packages. If it fails, use:
Code:

emerge --resume


or if you prefer the emerge wrapper I showed you I believe its:
Code:

emerge profuse
profuse

Set all the USE flags you like, and then:
Code:

./emwrap.sh -weuNb


This wrapper is the best tool, as it rebuilds your toolchain each time GCC or GLIBC has an update. This method is longest, because GCC is built twice. The First time your old GCC builds the new GCC and the second time the new GCC builds GLIBC, GCC and all its associated packages again, which ensures your system compiler is always up to date. The N or --newuse ensures that all the flags you set are used.
_________________
The Birth and Growth of Science is the Death and Atrophy of Art -- Unknown
Registerd Linux User #363735
Adopt a Post | Strip Comments| Emerge Wrapper
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum