View previous topic :: View next topic |
Author |
Message |
snkmoorthy Guru
Joined: 19 Nov 2002 Posts: 376
|
Posted: Fri Feb 28, 2003 6:04 am Post subject: P4 sse and sse2 |
|
|
hello,
I had an earlier post on moving a system from P3 to P4. Someone directed me to info on why without sse P4 performs badly compared to a P3!.
Anyway, I put 'sse' and 'sse2' in my make.conf and re-emerged my system. Especially, when I emerged 'mplayer', it told me that mmx, sse, sse2 are enabled, but it hangs when playing stuff.
So my question is can I get better preformance from X, Gnome, gcc with sse2? Another question is how can confirm for each program that it indeed is using SSE/SSE2(using the CPU fully)?
thanks in advance. |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20067
|
Posted: Fri Feb 28, 2003 6:12 am Post subject: |
|
|
Moved from Installing Gentoo. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
Twist Guru
Joined: 03 Jan 2003 Posts: 414 Location: San Diego
|
Posted: Fri Feb 28, 2003 6:28 am Post subject: |
|
|
The mplayer build auto-detects the capabilities of your processor and turns on/off SSE style features accordingly. I believe it ignores USE flags in this case. It's unlikely the USE flags are causing any issues with your build, it's probably something else (just FYI, I have a P4 with SSE2 enabled in my mplayer build and it runs fine).
According to the USE flags list there is no flag specifically for SSE2, so if you are setting that, it's not doing anything.
Instead of use flags, what you are really looking to do is set your CFLAGS to enable sse. For gcc this would (on a P4 system) be: "-msse2 -mfpmath=sse" and yes the -mfpmath is correct that way, it WILL use SSE2. Then with those CFLAGS you recompile gnome or what have you to see the "full use" of your processor.
However, there are caveats:
* The GCC 3.2 series has proven problematic about correct use of SSE/SSE2. It seems to rarely optimize for it well even when the flag is present, and earlier had some fundamental bugs that made it almost impossible to use. Notably SSE2 data would not be properly aligned causing big ol' seg faults when it was used. ICC on P4s has shown itself to optimize much better for instance (not surprisingly, considering that Intel makes ICC).
* The "right thing" to do is not to force the SSE flags, but set "-march=pentium4" instead, and allow GCC to choose for itself whether SSE is an appropriate optimization given that architecture.
It sounds like your friend is confused over the P4 versus P3 issue. At the *same cycle rate* a P3 would be faster than a P4 at common application usage (integer math, basic memory moves, etc). This is because the P4 was pipelined differently to deal with multimedia apps. At the *same cycle rate* a P4 will beat a P3 at multimedia, because of the pipelining design and yes, SSE2 is better than SSE.
However, this does not translate to "the P3 is faster than a P4". The truth is that P4s have scaled better than P3s on dye so the average P4 cycle rate is higher than that of the P3, and more than compensates for the pipelining/burst hit the P4 took.
The truth is that processor speed increases in general case applications has proven to be surprisingly linear over the last decade. Most of the leaps were made early on (with the movement of L2 cache directly to the processor, which was really a manufacturing improvement, not a big design advance) and since then we've been plodding along a pretty steady path. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|