Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
SMP Torture Test?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Entropy42
n00b
n00b


Joined: 05 Mar 2004
Posts: 56

PostPosted: Sun Mar 05, 2006 5:34 pm    Post subject: SMP Torture Test? Reply with quote

OK, I just recently bought a Dell Inspiron E1705 (same exact system as the Inspiron 9400), which has the new Intel Core Duo processor.

I'm pretty positive there is a hardware problem with the system - under both Windows and Linux the system intermittently freezes, sometimes throwing a Machine Check Exception instead under Windows, within 10-20 minutes of boot under even moderate stress. This only occurs when I have the second core enabled in BIOS - If I disable it the system is rock solid. Also, running non-SMP-aware tests (such as memtest86, which only uses one core as far as I know) seems to work fine. I haven't tried booting a non-SMP kernel or using the nosmp option when the second core is enabled in BIOS yet. (Edit: I plan on doing that as soon as the system finishes a full Extended Test with the Dell Diagnostic utilities. While past experience with this particular problem gives me the opinion that Dell Diagnostics won't find anything because it's not running the right tests, I want to be able to say to Dell that I have run the complete diagnostics battery.)

What I would like is for some sort of "SMP torture test" program (an existing app is fine, as long as it can do what I want). What I want is:

Loads up both cores to 100% CPU usage.
Has heavy communication between the processes on each core. (This is what I believe Dell Diagnostics is failing to test.)
(what would be nice is to also have multiple processes on a single core.)
Possibly, a mode with low actual CPU usage but intense communications between the cores.

In short, I'm trying to find a test case that instead of intermittently clobbering the system during normal usage, will hopefully cause a freeze within an extremely short time of running it. Such a repeatable test case will make it easier for me to convince Dell that they simply need to send me some replacement hardware instead of waste time investigating some sort of software fix. Given that machine check exceptions are inherently caused by bad hardware and that this problem exists under both Linux and Windows, I know it's the hardware, but the more ammo I have to convince Dell of that fact the better.
Back to top
View user's profile Send private message
radoslawc
Tux's lil' helper
Tux's lil' helper


Joined: 08 Jun 2005
Posts: 112
Location: POLAND

PostPosted: Sun Mar 05, 2006 8:09 pm    Post subject: Reply with quote

Hi there!
first run memtest (its on gentoo cd for example), check ram timings in BIOS setup (look out sometimes default or "by spd" causes trouble try setting 2,5-6-3-3) this is for ram/chipset second thing use prime run for example two instances of it choose different options for FFT to test RAM + cache + cpu or each of them. By now this is the best program for cpu errors testing and produces higher load than any others.
good luck cheers (:
_________________
“You know all those extra lines of code that we sometimes refer to as bloatware. It's actually very important and efficient stuff. There is some very substantial functionality built into the operating system.” Laura DiDio
Back to top
View user's profile Send private message
Entropy42
n00b
n00b


Joined: 05 Mar 2004
Posts: 56

PostPosted: Sun Mar 05, 2006 9:13 pm    Post subject: Reply with quote

radoslawc wrote:
Hi there!
first run memtest (its on gentoo cd for example), check ram timings in BIOS setup (look out sometimes default or "by spd" causes trouble try setting 2,5-6-3-3) this is for ram/chipset second thing use prime run for example two instances of it choose different options for FFT to test RAM + cache + cpu or each of them. By now this is the best program for cpu errors testing and produces higher load than any others.
good luck cheers (:

I'll try prime. Have already run memtest86, it found no errors. RAM timings cannot be changed, Dell laptop BIOSes are pretty simplistic in this regard. :( That said, in single-core mode, the memory never has any problems. Nothing has problems when the second core is disabled in BIOS, and so far, nothing has problems with non-SMP-aware apps.

I did sort-of find a possible test. I read a tutorial on Linux threaded programming at http://yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html and modified one of the examples to create 16 threads that count to 320000000 using a single counter. It's pretty brutal on both my desktop machine and my A64 X2-based server, as soon as Dell's diagnostics suite finishes I'll be running it on the laptop. After that I'll try prime. :)
Back to top
View user's profile Send private message
Abzstrak
n00b
n00b


Joined: 20 Sep 2002
Posts: 33

PostPosted: Mon Mar 06, 2006 12:35 am    Post subject: Reply with quote

yea, memtest is good to try, I'd run it for 12+ hours. Also start up a thread of prime95 on each core and run the torture test at the highest priority (I think it allows up to 10). Again, I'd run prime95 for 12+ hours to determine stability
_________________
L8R
-Abzstrak
Back to top
View user's profile Send private message
bollucks
l33t
l33t


Joined: 27 Oct 2004
Posts: 606

PostPosted: Mon Mar 06, 2006 1:23 am    Post subject: Reply with quote

Hackbench. Code is out there somewhere but it causes intense inter process communication. According to Rusty Russell who wrote the benchmark it is more about whether it completes rather than the actual values it returns.
Back to top
View user's profile Send private message
Entropy42
n00b
n00b


Joined: 05 Mar 2004
Posts: 56

PostPosted: Mon Mar 06, 2006 3:39 am    Post subject: Reply with quote

Abzstrak wrote:
yea, memtest is good to try, I'd run it for 12+ hours. Also start up a thread of prime95 on each core and run the torture test at the highest priority (I think it allows up to 10). Again, I'd run prime95 for 12+ hours to determine stability

I'm thinking less of proving stability and more in terms of proving a definitive test case for hardware instability (or possibly a bug in SMP implementation that affects both Windows and Linux on the new Intel Core Duo, but that's not likely or we would've heard tons of reports about it already).

I KNOW the system is unstable, I'm positive it's the hardware, but I want to make sure I have no problems convincing Dell of that so that they replace the machine without too much screwing around. I can usually get it to crash within 5-10 minutes, but I would love to find a test case that can bring the machine down even sooner.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum