View previous topic :: View next topic |
Author |
Message |
bernd_b Tux's lil' helper
Joined: 25 Nov 2003 Posts: 148
|
Posted: Mon Feb 12, 2007 12:18 am Post subject: [solved - maybe]athcool causes more unstability than before? |
|
|
I used athcool on my Desktop PC with an Athlon XP 2400+ T-Bred/Thorton CPU for about an year now.
For some month now, after some KDE and kernel updates but no changes in my hardware, my system shows unexpected unstability: After running some minutes or up to 1 or two days the system suddenly freezes.
I had an upgrade from gcc 3 to gcc 4 under suspicion and walked through several changes in my BIOS with no success.
Now I stopped athcool and to my surprise the system seems to be stable again (well, I hope so after running two days and recompiling a lot to cause load...)
But is this possible at all that athcool is becomming a problem? Or is it just a hint to something else broken?
Is there another powersaving-method for my hardware (the CPU described above runs on an old VIA KT333a-Board)?
Last edited by bernd_b on Mon Feb 19, 2007 1:40 pm; edited 1 time in total |
|
Back to top |
|
|
Doogman Apprentice
Joined: 24 Sep 2004 Posts: 244
|
Posted: Mon Feb 12, 2007 9:04 pm Post subject: |
|
|
All that athcool does is throw a "switch" in the motherboard to activate the extra power-saving feature. It's not a daemon or anything, so I doubt gcc would have anything to do with it.
You can activate the power saving with a command line option, as described here:
http://tldp.org/HOWTO/Athlon-Powersaving-HOWTO/
Athcool is just a user-friendly version of this command. |
|
Back to top |
|
|
Cyker Veteran
Joined: 15 Jun 2006 Posts: 1746
|
Posted: Mon Feb 12, 2007 10:45 pm Post subject: |
|
|
Oh wow, I haven't seen this for a while, brings back memories
Short Answer:
Assuming this is the same issue, it's not a software thing but a hardware issue.
The only known fix is to get an updated BIOS/Motherboard or stop using athcool.
(If you do get a new mobo, you won't need athcool as both Intel Core and Athlon64 have a built-in ability to do a similar trick )
Bit More Detail:
Personally, I never liked this hack, because that's what it is. You're making the system do something it wasn't designed to do - Later versions are made more tolerant such that most people thought it was perfectly safe, but that doesn't change the fact that it is a nasty hack
Unfortunately, aside from underclocking and undervolting (Which is fixed ), I never found any alternatives for Athlons.
There was a hack for dynamically altering the bus speed of some nForce boards, but this didn't really affect the temps much, and by the sound of things wasn't much stabler.
This is actually why I moved my server to an Athlon64. My AthlonXP 1800+ averaged a temperature of 46 degrees; My Athlon64 3000+ averages a temperature of 28 degrees idle and 40 degrees full load in the same case and the stock Athlon64 cooler
(Unfortunately the chipset, which has a fan, runs a fair bit hotter than my old AthlonXP's chipset, which was passively cooled... guess you can't win them all )
More Than You Probably Wanted To Know:
When I had a Pentium MMX and an AMD K6-III and was still using Win98, there was a program called Rain, which basically sent HLT instructions to the CPU during idle cycles, much like Linux and Windows NT do by default.
This basically 'paused' it, saving a lot of power (And thus heat!).
When AMD released Athlons, they used a 'new' Bus architecture called EV9 (Actually nicked from DEC Alphas) - There's a lot of guff with this, but the upshot is it works differently enough to the old Intel-style bus that the old HLT trick didn't do anything.
The reason is that Athlons don't want to disconnect themselves from the bus, which is needed for them to go into the low-power state, so they essentially don't do anything when HLT is issued to them. (This isn't the truth, but as Lob-Sang would say, it's a very good lie )
Some enterprising people found a way to re-enable the CPU bus-disconnect for HLT instructions (Which is what athcool enables), which gave the HLT instructions the same kind of cooling on Socket A Athlons that the Socket 7 up to Pentium III enjoyed.
Unfortunately, it *does* cause instability problems - They were pretty well documented back in the day, but I wouldn't even know where to look now for the supporting docs
Basically, the EV9 bus does *not* like having the CPU disconnected from it, and it takes a very significant time to execute the disconnect and re-connect operations.
This means that any I/O or instructions destined for the CPU would get stalled on the bus, waiting for the damned thing to wake up
You'd often notice things like stuttering in real-time operations (e.g. Sound, video), and data loss/corruption was not uncommon when doing heavy processing.
In early motherboards, this caused really bad problems, often resulting in kernel panics (Linux) and BSODs (Windows). Later boards and CPUs (mainly AthlonXP) were made more tolerant of the 'stalls', presumably by increasing buffering and bus time-outs so the CPU had enough time to wake up and re-connect to the bus.
Also, newer features in the BIOS also helped increase stability when using this hack - Stuff like increasing PCI Latency and wait states, and turning on things like Delayed Transaction, ISA Transaction Buffering etc..
Modern CPUs have no use for athcool - Intel first introduced throttling in the original P4 (Partly to stop it catching fire ) which made the HLT obsolete. (I'm ignoring 'mobile' CPUs, which have always had things like SpeedStep and PowerNow!)
AMD followed a lot later with Cool'n'Quiet.
Both of these are vastly more effective at cooling the CPU than the old HLT instruction |
|
Back to top |
|
|
Doogman Apprentice
Joined: 24 Sep 2004 Posts: 244
|
Posted: Tue Feb 13, 2007 10:52 pm Post subject: |
|
|
As long as you have a newer Athlon, like one of the XP's, stability seems to vary depending on the motherboard implementation.
My own 'lil headless server is a Athlon XP 2800 in a VIA based motherboard. I can't tell you the model offhand, but it was one of the last series available as this board was a replacement for another that fried. I use Athcool all the time and stability is absolutely excellent: the last uptime was over 6 months until the UPS battery ran out of juice.
I don't really care about CPU heat (as long as it's not high) but am more concerned about power usage. Measured with a Kill-a-Watt meter at the plug:
- F@H (100% CPU) 129W
- Idle (without Athcool) 103W
- Idle (with Athcool) 65W
It's definitely worth running on a 24/7 system if it doesn't cause problems.
But this is way off the original poster's question, he had a system that was stable with Athcool but now isn't. I can't offer a solution on why that is, but I doubt it's a software problem. |
|
Back to top |
|
|
widan Veteran
Joined: 07 Jun 2005 Posts: 1512 Location: Paris, France
|
Posted: Tue Feb 13, 2007 11:50 pm Post subject: |
|
|
Cyker wrote: | Oh wow, I haven't seen this for a while, brings back memories |
Old days... I remember when a kernel upgrade (that maybe increased the clock tick frequency - the local APIC fast interrupt acknowledge for the clock was involved in the problem) transformed a stable system into one that would hold about 5 minutes before freezing.
Cyker wrote: | There was a hack for dynamically altering the bus speed of some nForce boards, but this didn't really affect the temps much, and by the sound of things wasn't much stabler. |
It was even worse. I experimented with that one too - it was quite sensitive not only to the bus speed you set, but also to the way you ramped it up or down (I had crashes while raising the frequency too fast).
Cyker wrote: | Unfortunately, it *does* cause instability problems - They were pretty well documented back in the day, but I wouldn't even know where to look now for the supporting docs |
One theory there was at the time was that if a timer interrupt initiated by the local APIC fired, it would cause a reconnect, and if there was nothing special to do at this clock tick, the IRQ would get ACKed very fast (faster than with the normal interrupt controller, since the APIC is on-CPU) and the system would go back to the idle thread that would execute another hlt and cause a second disconnect too soon after the reconnect. This could in some cases confuse the CPU, the chipset or both, and hung the machine. Before the "proper" fix (setting an additional bit in the nForce2 that probably added some delay before the chipset would issue the disconnect signal) was found, there were patches that involved adding a delay before ACKing the timer interrupts.
Cyker wrote: | You'd often notice things like stuttering in real-time operations (e.g. Sound, video), and data loss/corruption was not uncommon when doing heavy processing. |
I think this one was a different problems, and was for VIA chipsets (and was solved with various PCI latency patches). Both problems were caused by timing issues, and athcool may have exacerbated the VIA problem too. The machine hangs were mostly (only ?) for nForce 2 chipsets.
Doogman wrote: | But this is way off the original poster's question, he had a system that was stable with Athcool but now isn't. I can't offer a solution on why that is, but I doubt it's a software problem. |
Recent kernels have the nForce2 halt disconnect fix (and I think more recent versions of athcool also have it). Some things that can be tried (since they are all part of the problem) are disabling the local APIC or reducing timer tick frequency. |
|
Back to top |
|
|
bernd_b Tux's lil' helper
Joined: 25 Nov 2003 Posts: 148
|
Posted: Wed Feb 14, 2007 12:29 pm Post subject: |
|
|
Quote: | Some things that can be tried (since they are all part of the problem) are disabling the local APIC or reducing timer tick frequency. |
Regarding all I'm reading here I fear a lot of stability problems I thought I had because of uprading gcc from 3.* to 4.* may already derived from using athcool instead of the uprade .
Yes, I was surprised too to see my power consumption decreasing from 120 Watt to 80 Watt while running no programms (apart from X/ KDE itself) jsut because of using this "hack" - the heat would be something I could ignore.
But having a system which freezes every one to 8 hours is a price too high. Because I've already switched on "all latency and buffer pci and so on"- options on my ecs VIA KT333a-Board I will look for the options described above (although google first have to tell me what to do exactly) - but there is not much hope.
Maybe it is still cheaper to have a higher power consumption than to by a new board and CPU. |
|
Back to top |
|
|
bernd_b Tux's lil' helper
Joined: 25 Nov 2003 Posts: 148
|
Posted: Mon Feb 19, 2007 1:38 pm Post subject: |
|
|
Well,
enabling local apic seems to gain back stability:
For reasons I don't know dmesg indicates, that local apic was disabled in my BIOS. So I did what it suggests: booting my machine with the kernel option lapic.
Now I get this with dmesg:
Code: | Local APIC disabled by BIOS -- reenabling.
Found and enabled local APIC!
mapped APIC to ffffd000 (fee00000) |
And for four days, two reboots (three days without stop included) everything works fine again. Let's hope the best.... |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|