View previous topic :: View next topic |
Author |
Message |
Entropy42 n00b
Joined: 05 Mar 2004 Posts: 56
|
Posted: Sun Mar 05, 2006 5:34 pm Post subject: SMP Torture Test? |
|
|
OK, I just recently bought a Dell Inspiron E1705 (same exact system as the Inspiron 9400), which has the new Intel Core Duo processor.
I'm pretty positive there is a hardware problem with the system - under both Windows and Linux the system intermittently freezes, sometimes throwing a Machine Check Exception instead under Windows, within 10-20 minutes of boot under even moderate stress. This only occurs when I have the second core enabled in BIOS - If I disable it the system is rock solid. Also, running non-SMP-aware tests (such as memtest86, which only uses one core as far as I know) seems to work fine. I haven't tried booting a non-SMP kernel or using the nosmp option when the second core is enabled in BIOS yet. (Edit: I plan on doing that as soon as the system finishes a full Extended Test with the Dell Diagnostic utilities. While past experience with this particular problem gives me the opinion that Dell Diagnostics won't find anything because it's not running the right tests, I want to be able to say to Dell that I have run the complete diagnostics battery.)
What I would like is for some sort of "SMP torture test" program (an existing app is fine, as long as it can do what I want). What I want is:
Loads up both cores to 100% CPU usage.
Has heavy communication between the processes on each core. (This is what I believe Dell Diagnostics is failing to test.)
(what would be nice is to also have multiple processes on a single core.)
Possibly, a mode with low actual CPU usage but intense communications between the cores.
In short, I'm trying to find a test case that instead of intermittently clobbering the system during normal usage, will hopefully cause a freeze within an extremely short time of running it. Such a repeatable test case will make it easier for me to convince Dell that they simply need to send me some replacement hardware instead of waste time investigating some sort of software fix. Given that machine check exceptions are inherently caused by bad hardware and that this problem exists under both Linux and Windows, I know it's the hardware, but the more ammo I have to convince Dell of that fact the better. |
|
Back to top |
|
|
radoslawc Tux's lil' helper
Joined: 08 Jun 2005 Posts: 112 Location: POLAND
|
Posted: Sun Mar 05, 2006 8:09 pm Post subject: |
|
|
Hi there!
first run memtest (its on gentoo cd for example), check ram timings in BIOS setup (look out sometimes default or "by spd" causes trouble try setting 2,5-6-3-3) this is for ram/chipset second thing use prime run for example two instances of it choose different options for FFT to test RAM + cache + cpu or each of them. By now this is the best program for cpu errors testing and produces higher load than any others.
good luck cheers (: _________________ “You know all those extra lines of code that we sometimes refer to as bloatware. It's actually very important and efficient stuff. There is some very substantial functionality built into the operating system.” Laura DiDio |
|
Back to top |
|
|
Entropy42 n00b
Joined: 05 Mar 2004 Posts: 56
|
Posted: Sun Mar 05, 2006 9:13 pm Post subject: |
|
|
radoslawc wrote: | Hi there!
first run memtest (its on gentoo cd for example), check ram timings in BIOS setup (look out sometimes default or "by spd" causes trouble try setting 2,5-6-3-3) this is for ram/chipset second thing use prime run for example two instances of it choose different options for FFT to test RAM + cache + cpu or each of them. By now this is the best program for cpu errors testing and produces higher load than any others.
good luck cheers (: |
I'll try prime. Have already run memtest86, it found no errors. RAM timings cannot be changed, Dell laptop BIOSes are pretty simplistic in this regard. That said, in single-core mode, the memory never has any problems. Nothing has problems when the second core is disabled in BIOS, and so far, nothing has problems with non-SMP-aware apps.
I did sort-of find a possible test. I read a tutorial on Linux threaded programming at http://yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html and modified one of the examples to create 16 threads that count to 320000000 using a single counter. It's pretty brutal on both my desktop machine and my A64 X2-based server, as soon as Dell's diagnostics suite finishes I'll be running it on the laptop. After that I'll try prime. |
|
Back to top |
|
|
Abzstrak n00b
Joined: 20 Sep 2002 Posts: 33
|
Posted: Mon Mar 06, 2006 12:35 am Post subject: |
|
|
yea, memtest is good to try, I'd run it for 12+ hours. Also start up a thread of prime95 on each core and run the torture test at the highest priority (I think it allows up to 10). Again, I'd run prime95 for 12+ hours to determine stability _________________ L8R
-Abzstrak |
|
Back to top |
|
|
bollucks l33t
Joined: 27 Oct 2004 Posts: 606
|
Posted: Mon Mar 06, 2006 1:23 am Post subject: |
|
|
Hackbench. Code is out there somewhere but it causes intense inter process communication. According to Rusty Russell who wrote the benchmark it is more about whether it completes rather than the actual values it returns. |
|
Back to top |
|
|
Entropy42 n00b
Joined: 05 Mar 2004 Posts: 56
|
Posted: Mon Mar 06, 2006 3:39 am Post subject: |
|
|
Abzstrak wrote: | yea, memtest is good to try, I'd run it for 12+ hours. Also start up a thread of prime95 on each core and run the torture test at the highest priority (I think it allows up to 10). Again, I'd run prime95 for 12+ hours to determine stability |
I'm thinking less of proving stability and more in terms of proving a definitive test case for hardware instability (or possibly a bug in SMP implementation that affects both Windows and Linux on the new Intel Core Duo, but that's not likely or we would've heard tons of reports about it already).
I KNOW the system is unstable, I'm positive it's the hardware, but I want to make sure I have no problems convincing Dell of that so that they replace the machine without too much screwing around. I can usually get it to crash within 5-10 minutes, but I would love to find a test case that can bring the machine down even sooner. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|