Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
optimization flags, myths and truths for the real world ;-)
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2  
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64
View previous topic :: View next topic  
Author Message
Paapaa
l33t
l33t


Joined: 14 Aug 2005
Posts: 955
Location: Finland

PostPosted: Thu Dec 06, 2007 8:12 pm    Post subject: Reply with quote

If there would be a magical flag to give reliable speed-ups with no significant downsides, those flags would be enabled by default already.

squirrelfishfrog, I found the defaults for various arches but I didn't manage to find the defaults for "generic". I'll look a bit more later. I think this stuff should be clearly documented...
_________________
Paludis, the way packages are meant to be managed.
Back to top
View user's profile Send private message
loftwyr
l33t
l33t


Joined: 29 Dec 2004
Posts: 970
Location: 43°38'23.62"N 79°27'8.60"W

PostPosted: Fri Dec 07, 2007 12:45 am    Post subject: Reply with quote

mtune=generic turns off sse3 and 3dnow extensions. It's primary for portability between amd and intel based x86-64
_________________
My emerge --info
Have you run revdep-rebuild lately? It's in gentoolkit and it's worth a shot if things don't work well.
Celebrating 5 years of Gentoo-ing.
Back to top
View user's profile Send private message
darklegion
Guru
Guru


Joined: 14 Nov 2004
Posts: 468

PostPosted: Fri Dec 07, 2007 3:47 am    Post subject: Reply with quote

loftwyr wrote:
mtune=generic turns off sse3 and 3dnow extensions. It's primary for portability between amd and intel based x86-64


-march=nocona -mtune=generic turns on sse3 support as does -march=native on a core2 processor.It seems to me that sse3 support depends on the march option not mtune.
Back to top
View user's profile Send private message
loftwyr
l33t
l33t


Joined: 29 Dec 2004
Posts: 970
Location: 43°38'23.62"N 79°27'8.60"W

PostPosted: Fri Dec 07, 2007 6:10 pm    Post subject: Reply with quote

-march=native is broken and doesn't find sse3 no matter what. I posted the bug earlier in the thread.
_________________
My emerge --info
Have you run revdep-rebuild lately? It's in gentoolkit and it's worth a shot if things don't work well.
Celebrating 5 years of Gentoo-ing.
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2289
Location: near Augsburg, Germany

PostPosted: Fri Dec 07, 2007 7:05 pm    Post subject: Reply with quote

ASFLAGS="-O" improves speed.
Back to top
View user's profile Send private message
Paapaa
l33t
l33t


Joined: 14 Aug 2005
Posts: 955
Location: Finland

PostPosted: Fri Dec 07, 2007 7:18 pm    Post subject: Reply with quote

Keruskerfuerst wrote:
ASFLAGS="-O" improves speed.


How about giving some more info :lol:

1. What programs were tested?
2. What were the exact settings?
3. How did you test?
4. What was the actual result?
_________________
Paludis, the way packages are meant to be managed.
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2289
Location: near Augsburg, Germany

PostPosted: Fri Dec 07, 2007 8:49 pm    Post subject: Reply with quote

ASFLAGS="-O" improves speed by parallilizing instructions. Works also great with single core CPUs (>1 pipe, mmx/sse/sse2 execution unit, >1 vector units and FPU).

I have recompiled the base system of gentoo with the above written flags.

I have not made a automized speed test.

Compiling and browsing is much faster in comparison without ASFLAGS="-O".


Last edited by Keruskerfuerst on Sat Dec 08, 2007 5:33 am; edited 1 time in total
Back to top
View user's profile Send private message
Paapaa
l33t
l33t


Joined: 14 Aug 2005
Posts: 955
Location: Finland

PostPosted: Fri Dec 07, 2007 9:03 pm    Post subject: Reply with quote

Could you test with a few CPU intensive programs posting the actual results? Placebo makes many things "feel" faster when there is no difference at all...

BTW, what are ASFLAGS?
_________________
Paludis, the way packages are meant to be managed.
Back to top
View user's profile Send private message
timeBandit
Bodhisattva
Bodhisattva


Joined: 31 Dec 2004
Posts: 2719
Location: here, there or in transit

PostPosted: Fri Dec 07, 2007 10:08 pm    Post subject: Reply with quote

Paapaa wrote:
BTW, what are ASFLAGS?
Custom switches passed to the assembler, analogous to CFLAGS for the compiler and LDFLAGS for the linker.

Per the as(1) manual, the -O switch applies to Mitsubishi D10V/D30V and MIPS targets only. I smell a placebo. :P
_________________
Plants are pithy, brooks tend to babble--I'm content to lie between them.
Super-short f.g.o checklist: Search first, strip comments, mark solved, help others.
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2289
Location: near Augsburg, Germany

PostPosted: Fri Dec 07, 2007 10:49 pm    Post subject: Reply with quote

Code:
man as
is not correct.
Back to top
View user's profile Send private message
Paapaa
l33t
l33t


Joined: 14 Aug 2005
Posts: 955
Location: Finland

PostPosted: Fri Dec 07, 2007 11:22 pm    Post subject: Reply with quote

Keruskerfuerst wrote:
Code:
man as
is not correct.


Why would the manual say "-O" is D10V specific switch if it's not?? I'll look at the source code to see what it does. And in the meantime: post some actual test data.

EDIT: I just went through source and this switch is indeed present only on D10V, D30V and m32r. So if you are not running one of them you most likely witnessed what placebo means.
_________________
Paludis, the way packages are meant to be managed.


Last edited by Paapaa on Fri Dec 07, 2007 11:36 pm; edited 1 time in total
Back to top
View user's profile Send private message
timeBandit
Bodhisattva
Bodhisattva


Joined: 31 Dec 2004
Posts: 2719
Location: here, there or in transit

PostPosted: Fri Dec 07, 2007 11:30 pm    Post subject: Reply with quote

Keruskerfuerst wrote:
Code:
man as
is not correct.
Ah, of course. Then it must be a bug that:
Code:
# as --help | grep -- '-O'
yields nothing. Interesting bug, considering the source code lacks an -O option also. 8O
_________________
Plants are pithy, brooks tend to babble--I'm content to lie between them.
Super-short f.g.o checklist: Search first, strip comments, mark solved, help others.
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2289
Location: near Augsburg, Germany

PostPosted: Sat Dec 08, 2007 8:59 am    Post subject: Reply with quote

Reading and understanding is everything...

Just have a look at man as. Just do not use grep.
Back to top
View user's profile Send private message
Paapaa
l33t
l33t


Joined: 14 Aug 2005
Posts: 955
Location: Finland

PostPosted: Sat Dec 08, 2007 9:42 am    Post subject: Reply with quote

Keruskerfuerst wrote:
Reading and understanding is everything...

Just have a look at man as. Just do not use grep.


I'll just repeat myself here. This is getting a bit boring so I hope you read and understand this...

Just have a look at "man as" and the source code: "-O" is not available on all arches. It is available only on a few rare arches: D10V, D30V and m32r. i386 assembler which is used by most of us (including 32bit and 64bit systems) doesn't have "-O" switch at all. If you have a normal x86 PC you have no benefits (or any other influence) from that flag. So there is also no performance gain from "-O". Period.
_________________
Paludis, the way packages are meant to be managed.
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2289
Location: near Augsburg, Germany

PostPosted: Sat Dec 08, 2007 2:06 pm    Post subject: Reply with quote

Just try out. And recompile the basesystem with ASFLAGS="-O".

Check the system speed and speed of e.g. browsing.

In fact, the test program should use the parallel processing possibilities of one core.
Back to top
View user's profile Send private message
red-wolf76
l33t
l33t


Joined: 13 Apr 2005
Posts: 714
Location: Rhein-Main Area

PostPosted: Sat Dec 08, 2007 2:41 pm    Post subject: Reply with quote

Look Keruskerfürst, either you've got more to back it up with than just "Hey, this works!" or we're just as well-off with waving a chicken over our computers...

Either there's a reason for it to work that can be sensibly shown, or it's a case of compile voodoo, if the resulting code works at all...
_________________
0mFg, G3nt00 r0X0r$ T3h B1g!1111 ;)

Use sane CFLAGS! If for no other reason, do it for the lulz!
Back to top
View user's profile Send private message
timeBandit
Bodhisattva
Bodhisattva


Joined: 31 Dec 2004
Posts: 2719
Location: here, there or in transit

PostPosted: Sat Dec 08, 2007 3:21 pm    Post subject: Reply with quote

Keruskerfuerst wrote:
Reading and understanding is everything...
Just have a look at man as. Just do not use grep.
I read and understand quite well, tyvm--for example, I understand that you just contradicted yourself.
Yesterday, Keruskerfuerst wrote:
Code:
man as
is not correct.
...and now you recommend it. I didn't just use grep (which was only to avoid posting the full help text here, anyway). Like Paapaa, I reviewed the source code, at the links I provided above. The assembler does not have that switch except when built for the architectures noted. The D10V and D30V are for embedded applications, not general-purpose computers. The MIPS is, but it's pretty rare these days.

If you have a MIPS system, good for you, the -O switch helps--but it's of no use to us in the x86/amd64 world. Otherwise, as we say in the US, "put up or shut up." Post before/after measurements with details of the test system or drop the subject. (At this point I'd prefer the latter, since I detect the distinctive odor of troll.)

Now pardon me, I have to go find a chicken to wave over my PC.... :wink:
_________________
Plants are pithy, brooks tend to babble--I'm content to lie between them.
Super-short f.g.o checklist: Search first, strip comments, mark solved, help others.
Back to top
View user's profile Send private message
red-wolf76
l33t
l33t


Joined: 13 Apr 2005
Posts: 714
Location: Rhein-Main Area

PostPosted: Sat Dec 08, 2007 4:00 pm    Post subject: Reply with quote

Darn, I know I shouldn't have blurted out my "1337 s3kr1T m4D r1C1n9 h4X"... Serves me right...
_________________
0mFg, G3nt00 r0X0r$ T3h B1g!1111 ;)

Use sane CFLAGS! If for no other reason, do it for the lulz!
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2289
Location: near Augsburg, Germany

PostPosted: Sat Dec 08, 2007 8:34 pm    Post subject: Reply with quote

red-wolf76 wrote:
Look Keruskerfürst, either you've got more to back it up with than just "Hey, this works!" or we're just as well-off with waving a chicken over our computers...

Either there's a reason for it to work that can be sensibly shown, or it's a case of compile voodoo, if the resulting code works at all...


It is a matter of incorrect documentation.
Back to top
View user's profile Send private message
Paapaa
l33t
l33t


Joined: 14 Aug 2005
Posts: 955
Location: Finland

PostPosted: Sat Dec 08, 2007 9:10 pm    Post subject: Reply with quote

Keruskerfuerst wrote:
It is a matter of incorrect documentation.


Heh, and next you claim that the source code is also incorrect :lol: You can look at the source code yourself: "-O" has no meaning with x86/x86_64 arches. You should get the exact same binaries with and without "-O". So why do you keep on talking this nonsense? If you are just trolling, I admit that you got me.
_________________
Paludis, the way packages are meant to be managed.
Back to top
View user's profile Send private message
energyman76b
Advocate
Advocate


Joined: 26 Mar 2003
Posts: 2048
Location: Germany

PostPosted: Thu Dec 13, 2007 1:26 pm    Post subject: Re: sse/sse2 Reply with quote

squirrelfishfrog wrote:


Also I read that -mfpmath=sse,387 activates an additional processing units for sse (you did not comment on that one), but that had no positive effect for me.
I tried -funroll-loops: no advantage for sse2 compiled code.


that is wrong for intel cpus. With Intel, 387 and sse use the same stages (not all stages. AFAIR decoding stages) - you don't win anything with that flag. You loose power with that flag. In theory, AMD's CPUs could profit from that flag, but in reality they do not, thanks to gcc and other stuff.

In short, that flag is dangerous, it breaks things and best case nothing is slower.
_________________
Study finds stunning lack of racial, gender, and economic diversity among middle-class white males

I identify as a dirty penismensch.
Back to top
View user's profile Send private message
i92guboj
Bodhisattva
Bodhisattva


Joined: 30 Nov 2004
Posts: 10315
Location: Córdoba (Spain)

PostPosted: Thu Dec 13, 2007 6:19 pm    Post subject: Reply with quote

squirrelfishfrog wrote:
timeBandit wrote:
  1. http://funroll-loops.info/
  2. HOLY COW I'M TOTALLY GOING SO FAST OH F***
xoxo_davide wrote:
Please don't argue with me on those results, i'll not answer neither go in-deep anyway. If you want to know a bit more, read ahead, but then don't argue with me anyway. :P
If it brings you joy, by all means have fun, but it's largely wasted time.

You want optimized? :idea:: USE flags >> CFLAGS.


Like xoxo_davide already wrote: some applications do run a long time and are using a lot of floating point math: conversions (video stuff), encoding, fourier transforms (which are used in compression algorithms) and maybe your own programs. So maybe you wont appreciate 10% gain with kwrite but if the program is running an hour you would.


No, you would not either. If the program is running one year you might.

Still, for number crunching applications it is not always a good idea to enable -ffast-math, even if it seems to work ok, because it produces slightly incorrect results sometimes, and probably, you don't want that if you are into physics, do you? :P

http://lists4.opensuse.org/opensuse-commit/2006-10/msg00749.html
http://gcc.gnu.org/ml/gcc/2001-07/msg01864.html

Google a bit and you will find many more about the issue.

So, if there is an scenario where -ffast-math is not a good thing it is precisely when you talk about physics, where numbers must be exact to the possible degree. I will however concede you one point: it can be useful on media enconding scenarios, where precision is not a must (and that is why we all use lousy codecs under most circumstances).
Back to top
View user's profile Send private message
MostAwesomeDude
Guru
Guru


Joined: 12 Aug 2007
Posts: 373

PostPosted: Sat Dec 15, 2007 7:26 pm    Post subject: Reply with quote

Hey, guys. Just posting my two cents.

-ffast-math is a very dangerous flag, in that an app must, from the ground up, be designed with that flag in mind. All of the stuff I've written so far breaks with -ffast-math in various and colorful ways. OpenGL apps have all kinds of errors, as you might imagine, and even simple wx apps have slightly strange behavior sometimes. (Also you can't build a static wxGTK 2.6 with -ffast-math. Don't ask.)

It's completely up to the original author as to whether or not to use that flag, but don't say you weren't warned.

Also, -fomit-frame-pointer is fine on 32-bit x86 if you're never going to debug anything.
Back to top
View user's profile Send private message
energyman76b
Advocate
Advocate


Joined: 26 Mar 2003
Posts: 2048
Location: Germany

PostPosted: Sat Dec 15, 2007 7:36 pm    Post subject: Reply with quote

MostAwesomeDude wrote:
Hey, guys. Just posting my two cents.

-ffast-math is a very dangerous flag, in that an app must, from the ground up, be designed with that flag in mind. All of the stuff I've written so far breaks with -ffast-math in various and colorful ways. OpenGL apps have all kinds of errors, as you might imagine, and even simple wx apps have slightly strange behavior sometimes. (Also you can't build a static wxGTK 2.6 with -ffast-math. Don't ask.)

It's completely up to the original author as to whether or not to use that flag, but don't say you weren't warned.

Also, -fomit-frame-pointer is fine on 32-bit x86 if you're never going to debug anything.


exactly! And when some app or lib profits from this flag, it is usually set by the makefile anyway - so no reason to set it!
_________________
Study finds stunning lack of racial, gender, and economic diversity among middle-class white males

I identify as a dirty penismensch.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64 All times are GMT
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum