View previous topic :: View next topic |
Author |
Message |
Paapaa l33t
Joined: 14 Aug 2005 Posts: 955 Location: Finland
|
Posted: Thu Dec 06, 2007 8:12 pm Post subject: |
|
|
If there would be a magical flag to give reliable speed-ups with no significant downsides, those flags would be enabled by default already.
squirrelfishfrog, I found the defaults for various arches but I didn't manage to find the defaults for "generic". I'll look a bit more later. I think this stuff should be clearly documented... _________________ Paludis, the way packages are meant to be managed. |
|
Back to top |
|
|
loftwyr l33t
Joined: 29 Dec 2004 Posts: 970 Location: 43°38'23.62"N 79°27'8.60"W
|
Posted: Fri Dec 07, 2007 12:45 am Post subject: |
|
|
mtune=generic turns off sse3 and 3dnow extensions. It's primary for portability between amd and intel based x86-64 _________________ My emerge --info
Have you run revdep-rebuild lately? It's in gentoolkit and it's worth a shot if things don't work well.
Celebrating 5 years of Gentoo-ing. |
|
Back to top |
|
|
darklegion Guru
Joined: 14 Nov 2004 Posts: 468
|
Posted: Fri Dec 07, 2007 3:47 am Post subject: |
|
|
loftwyr wrote: | mtune=generic turns off sse3 and 3dnow extensions. It's primary for portability between amd and intel based x86-64 |
-march=nocona -mtune=generic turns on sse3 support as does -march=native on a core2 processor.It seems to me that sse3 support depends on the march option not mtune. |
|
Back to top |
|
|
loftwyr l33t
Joined: 29 Dec 2004 Posts: 970 Location: 43°38'23.62"N 79°27'8.60"W
|
Posted: Fri Dec 07, 2007 6:10 pm Post subject: |
|
|
-march=native is broken and doesn't find sse3 no matter what. I posted the bug earlier in the thread. _________________ My emerge --info
Have you run revdep-rebuild lately? It's in gentoolkit and it's worth a shot if things don't work well.
Celebrating 5 years of Gentoo-ing. |
|
Back to top |
|
|
Keruskerfuerst Advocate
Joined: 01 Feb 2006 Posts: 2289 Location: near Augsburg, Germany
|
Posted: Fri Dec 07, 2007 7:05 pm Post subject: |
|
|
ASFLAGS="-O" improves speed. |
|
Back to top |
|
|
Paapaa l33t
Joined: 14 Aug 2005 Posts: 955 Location: Finland
|
Posted: Fri Dec 07, 2007 7:18 pm Post subject: |
|
|
Keruskerfuerst wrote: | ASFLAGS="-O" improves speed. |
How about giving some more info
1. What programs were tested?
2. What were the exact settings?
3. How did you test?
4. What was the actual result? _________________ Paludis, the way packages are meant to be managed. |
|
Back to top |
|
|
Keruskerfuerst Advocate
Joined: 01 Feb 2006 Posts: 2289 Location: near Augsburg, Germany
|
Posted: Fri Dec 07, 2007 8:49 pm Post subject: |
|
|
ASFLAGS="-O" improves speed by parallilizing instructions. Works also great with single core CPUs (>1 pipe, mmx/sse/sse2 execution unit, >1 vector units and FPU).
I have recompiled the base system of gentoo with the above written flags.
I have not made a automized speed test.
Compiling and browsing is much faster in comparison without ASFLAGS="-O".
Last edited by Keruskerfuerst on Sat Dec 08, 2007 5:33 am; edited 1 time in total |
|
Back to top |
|
|
Paapaa l33t
Joined: 14 Aug 2005 Posts: 955 Location: Finland
|
Posted: Fri Dec 07, 2007 9:03 pm Post subject: |
|
|
Could you test with a few CPU intensive programs posting the actual results? Placebo makes many things "feel" faster when there is no difference at all...
BTW, what are ASFLAGS? _________________ Paludis, the way packages are meant to be managed. |
|
Back to top |
|
|
timeBandit Bodhisattva
Joined: 31 Dec 2004 Posts: 2719 Location: here, there or in transit
|
Posted: Fri Dec 07, 2007 10:08 pm Post subject: |
|
|
Paapaa wrote: | BTW, what are ASFLAGS? | Custom switches passed to the assembler, analogous to CFLAGS for the compiler and LDFLAGS for the linker.
Per the as(1) manual, the -O switch applies to Mitsubishi D10V/D30V and MIPS targets only. I smell a placebo. _________________ Plants are pithy, brooks tend to babble--I'm content to lie between them.
Super-short f.g.o checklist: Search first, strip comments, mark solved, help others. |
|
Back to top |
|
|
Keruskerfuerst Advocate
Joined: 01 Feb 2006 Posts: 2289 Location: near Augsburg, Germany
|
Posted: Fri Dec 07, 2007 10:49 pm Post subject: |
|
|
is not correct. |
|
Back to top |
|
|
Paapaa l33t
Joined: 14 Aug 2005 Posts: 955 Location: Finland
|
Posted: Fri Dec 07, 2007 11:22 pm Post subject: |
|
|
Keruskerfuerst wrote: | is not correct. |
Why would the manual say "-O" is D10V specific switch if it's not?? I'll look at the source code to see what it does. And in the meantime: post some actual test data.
EDIT: I just went through source and this switch is indeed present only on D10V, D30V and m32r. So if you are not running one of them you most likely witnessed what placebo means. _________________ Paludis, the way packages are meant to be managed.
Last edited by Paapaa on Fri Dec 07, 2007 11:36 pm; edited 1 time in total |
|
Back to top |
|
|
timeBandit Bodhisattva
Joined: 31 Dec 2004 Posts: 2719 Location: here, there or in transit
|
Posted: Fri Dec 07, 2007 11:30 pm Post subject: |
|
|
Keruskerfuerst wrote: | is not correct. | Ah, of course. Then it must be a bug that: Code: | # as --help | grep -- '-O' | yields nothing. Interesting bug, considering the source code lacks an -O option also. _________________ Plants are pithy, brooks tend to babble--I'm content to lie between them.
Super-short f.g.o checklist: Search first, strip comments, mark solved, help others. |
|
Back to top |
|
|
Keruskerfuerst Advocate
Joined: 01 Feb 2006 Posts: 2289 Location: near Augsburg, Germany
|
Posted: Sat Dec 08, 2007 8:59 am Post subject: |
|
|
Reading and understanding is everything...
Just have a look at man as. Just do not use grep. |
|
Back to top |
|
|
Paapaa l33t
Joined: 14 Aug 2005 Posts: 955 Location: Finland
|
Posted: Sat Dec 08, 2007 9:42 am Post subject: |
|
|
Keruskerfuerst wrote: | Reading and understanding is everything...
Just have a look at man as. Just do not use grep. |
I'll just repeat myself here. This is getting a bit boring so I hope you read and understand this...
Just have a look at "man as" and the source code: "-O" is not available on all arches. It is available only on a few rare arches: D10V, D30V and m32r. i386 assembler which is used by most of us (including 32bit and 64bit systems) doesn't have "-O" switch at all. If you have a normal x86 PC you have no benefits (or any other influence) from that flag. So there is also no performance gain from "-O". Period. _________________ Paludis, the way packages are meant to be managed. |
|
Back to top |
|
|
Keruskerfuerst Advocate
Joined: 01 Feb 2006 Posts: 2289 Location: near Augsburg, Germany
|
Posted: Sat Dec 08, 2007 2:06 pm Post subject: |
|
|
Just try out. And recompile the basesystem with ASFLAGS="-O".
Check the system speed and speed of e.g. browsing.
In fact, the test program should use the parallel processing possibilities of one core. |
|
Back to top |
|
|
red-wolf76 l33t
Joined: 13 Apr 2005 Posts: 714 Location: Rhein-Main Area
|
Posted: Sat Dec 08, 2007 2:41 pm Post subject: |
|
|
Look Keruskerfürst, either you've got more to back it up with than just "Hey, this works!" or we're just as well-off with waving a chicken over our computers...
Either there's a reason for it to work that can be sensibly shown, or it's a case of compile voodoo, if the resulting code works at all... _________________ 0mFg, G3nt00 r0X0r$ T3h B1g!1111
Use sane CFLAGS! If for no other reason, do it for the lulz! |
|
Back to top |
|
|
timeBandit Bodhisattva
Joined: 31 Dec 2004 Posts: 2719 Location: here, there or in transit
|
Posted: Sat Dec 08, 2007 3:21 pm Post subject: |
|
|
Keruskerfuerst wrote: | Reading and understanding is everything...
Just have a look at man as. Just do not use grep. | I read and understand quite well, tyvm--for example, I understand that you just contradicted yourself. Yesterday, Keruskerfuerst wrote: | is not correct. | ...and now you recommend it. I didn't just use grep (which was only to avoid posting the full help text here, anyway). Like Paapaa, I reviewed the source code, at the links I provided above. The assembler does not have that switch except when built for the architectures noted. The D10V and D30V are for embedded applications, not general-purpose computers. The MIPS is, but it's pretty rare these days.
If you have a MIPS system, good for you, the -O switch helps--but it's of no use to us in the x86/amd64 world. Otherwise, as we say in the US, "put up or shut up." Post before/after measurements with details of the test system or drop the subject. (At this point I'd prefer the latter, since I detect the distinctive odor of troll.)
Now pardon me, I have to go find a chicken to wave over my PC.... _________________ Plants are pithy, brooks tend to babble--I'm content to lie between them.
Super-short f.g.o checklist: Search first, strip comments, mark solved, help others. |
|
Back to top |
|
|
red-wolf76 l33t
Joined: 13 Apr 2005 Posts: 714 Location: Rhein-Main Area
|
Posted: Sat Dec 08, 2007 4:00 pm Post subject: |
|
|
Darn, I know I shouldn't have blurted out my "1337 s3kr1T m4D r1C1n9 h4X"... Serves me right... _________________ 0mFg, G3nt00 r0X0r$ T3h B1g!1111
Use sane CFLAGS! If for no other reason, do it for the lulz! |
|
Back to top |
|
|
Keruskerfuerst Advocate
Joined: 01 Feb 2006 Posts: 2289 Location: near Augsburg, Germany
|
Posted: Sat Dec 08, 2007 8:34 pm Post subject: |
|
|
red-wolf76 wrote: | Look Keruskerfürst, either you've got more to back it up with than just "Hey, this works!" or we're just as well-off with waving a chicken over our computers...
Either there's a reason for it to work that can be sensibly shown, or it's a case of compile voodoo, if the resulting code works at all... |
It is a matter of incorrect documentation. |
|
Back to top |
|
|
Paapaa l33t
Joined: 14 Aug 2005 Posts: 955 Location: Finland
|
Posted: Sat Dec 08, 2007 9:10 pm Post subject: |
|
|
Keruskerfuerst wrote: | It is a matter of incorrect documentation. |
Heh, and next you claim that the source code is also incorrect You can look at the source code yourself: "-O" has no meaning with x86/x86_64 arches. You should get the exact same binaries with and without "-O". So why do you keep on talking this nonsense? If you are just trolling, I admit that you got me. _________________ Paludis, the way packages are meant to be managed. |
|
Back to top |
|
|
energyman76b Advocate
Joined: 26 Mar 2003 Posts: 2048 Location: Germany
|
Posted: Thu Dec 13, 2007 1:26 pm Post subject: Re: sse/sse2 |
|
|
squirrelfishfrog wrote: |
Also I read that -mfpmath=sse,387 activates an additional processing units for sse (you did not comment on that one), but that had no positive effect for me.
I tried -funroll-loops: no advantage for sse2 compiled code.
|
that is wrong for intel cpus. With Intel, 387 and sse use the same stages (not all stages. AFAIR decoding stages) - you don't win anything with that flag. You loose power with that flag. In theory, AMD's CPUs could profit from that flag, but in reality they do not, thanks to gcc and other stuff.
In short, that flag is dangerous, it breaks things and best case nothing is slower. _________________ Study finds stunning lack of racial, gender, and economic diversity among middle-class white males
I identify as a dirty penismensch. |
|
Back to top |
|
|
i92guboj Bodhisattva
Joined: 30 Nov 2004 Posts: 10315 Location: Córdoba (Spain)
|
Posted: Thu Dec 13, 2007 6:19 pm Post subject: |
|
|
squirrelfishfrog wrote: | timeBandit wrote: | - http://funroll-loops.info/
- HOLY COW I'M TOTALLY GOING SO FAST OH F***
xoxo_davide wrote: | Please don't argue with me on those results, i'll not answer neither go in-deep anyway. If you want to know a bit more, read ahead, but then don't argue with me anyway. | If it brings you joy, by all means have fun, but it's largely wasted time.
You want optimized? : USE flags >> CFLAGS. |
Like xoxo_davide already wrote: some applications do run a long time and are using a lot of floating point math: conversions (video stuff), encoding, fourier transforms (which are used in compression algorithms) and maybe your own programs. So maybe you wont appreciate 10% gain with kwrite but if the program is running an hour you would.
|
No, you would not either. If the program is running one year you might.
Still, for number crunching applications it is not always a good idea to enable -ffast-math, even if it seems to work ok, because it produces slightly incorrect results sometimes, and probably, you don't want that if you are into physics, do you?
http://lists4.opensuse.org/opensuse-commit/2006-10/msg00749.html
http://gcc.gnu.org/ml/gcc/2001-07/msg01864.html
Google a bit and you will find many more about the issue.
So, if there is an scenario where -ffast-math is not a good thing it is precisely when you talk about physics, where numbers must be exact to the possible degree. I will however concede you one point: it can be useful on media enconding scenarios, where precision is not a must (and that is why we all use lousy codecs under most circumstances). |
|
Back to top |
|
|
MostAwesomeDude Guru
Joined: 12 Aug 2007 Posts: 373
|
Posted: Sat Dec 15, 2007 7:26 pm Post subject: |
|
|
Hey, guys. Just posting my two cents.
-ffast-math is a very dangerous flag, in that an app must, from the ground up, be designed with that flag in mind. All of the stuff I've written so far breaks with -ffast-math in various and colorful ways. OpenGL apps have all kinds of errors, as you might imagine, and even simple wx apps have slightly strange behavior sometimes. (Also you can't build a static wxGTK 2.6 with -ffast-math. Don't ask.)
It's completely up to the original author as to whether or not to use that flag, but don't say you weren't warned.
Also, -fomit-frame-pointer is fine on 32-bit x86 if you're never going to debug anything. |
|
Back to top |
|
|
energyman76b Advocate
Joined: 26 Mar 2003 Posts: 2048 Location: Germany
|
Posted: Sat Dec 15, 2007 7:36 pm Post subject: |
|
|
MostAwesomeDude wrote: | Hey, guys. Just posting my two cents.
-ffast-math is a very dangerous flag, in that an app must, from the ground up, be designed with that flag in mind. All of the stuff I've written so far breaks with -ffast-math in various and colorful ways. OpenGL apps have all kinds of errors, as you might imagine, and even simple wx apps have slightly strange behavior sometimes. (Also you can't build a static wxGTK 2.6 with -ffast-math. Don't ask.)
It's completely up to the original author as to whether or not to use that flag, but don't say you weren't warned.
Also, -fomit-frame-pointer is fine on 32-bit x86 if you're never going to debug anything. |
exactly! And when some app or lib profits from this flag, it is usually set by the makefile anyway - so no reason to set it! _________________ Study finds stunning lack of racial, gender, and economic diversity among middle-class white males
I identify as a dirty penismensch. |
|
Back to top |
|
|
|