Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
optimization flags, myths and truths for the real world ;-)
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64
View previous topic :: View next topic  
Author Message
xoxo_davide
n00b
n00b


Joined: 24 Sep 2004
Posts: 37

PostPosted: Fri Nov 23, 2007 5:36 pm    Post subject: optimization flags, myths and truths for the real world ;-) Reply with quote

Ok, i finally found the time to post this one.
Everybody wants the best compiler flags (cflags) to put in its make.conf file for speed and optimization, so after days of tests on recent processors i found interesting news.
Everything said here applies on athlon 64, athlon 64 dual core and opteron. No other processor tested.
Compiler version used: gcc-4.1.2 , glibc-2.6.1
profile: amd64 no-multilib (that means pure 64 bit system, but that doesn't affect results).

So, for the impatients, here you have the 2 night & days performance tests result:

######################################################
Best cflags: -march=athlon64 -O3 -pipe -fomit-frame-pointer -finline-functions
######################################################

Shocked??? that looks so normal... ;-)
no -ffast-math? no -ftrace or other exotic flags?
And, just a note: -fomit-frame-pointer and -finline-functions are inlined by default with -O3, so they are actually not usefull. I just put them along because sometimes ebuilds replace -O3 with -O2, and the hope is that the two flags are kept along with -O2. And -pipe is not a real cflags, that's just to speed up compile time. So the result is -O3.

Please don't argue with me on those results, i'll not answer neither go in-deep anyway. If you want to know a bit more, read ahead, but then don't argue with me anyway. :-P

--

For who wants to know how i came to that result.

The hardware:
Used machines are 2 similar asus motherboard based, one with an athlon 64 and one with an athlon 64 dual core, and a supermicro server board (dual cpu support) with a single opteron.

How i performed the tests:
- I started from acovea (simple said, an evolutionary compiler-flags tester) running all tests twice on all machines and noting the results: acovea best flags, acovea optimistic flags, and acovea pessimistic flags. Every test run (i said every) gave me differents flags. But i had to start from somewhere. (note: acovea's tests are very specific to a narrow range of operations, they are a kind of cpu sector test).
- I need daily usage, and daily said i'll never sacrifice one cpu range of operations for another, so i had to do a cross-strip of cflags, deleting from acovea's best and acovea's optimistic any flag appearing into any of the acovea's pessimistic flags. From the remaining i made 4 groups of flags build-up from common best flags and common optimistic flags, plus i added some "common used" groups of best flags (aherm.. well.. at least "thought to be best flags"..). Total: 7 groups of cflags to test.
- Now i needed to choose some program to compile. Question: will i gain productivity from a 7% speed increase in processing openoffice documents? Not me. My typing speed is still slower then my processor.. ;-) and my Calc documents are not that heavy to see differences. So is for Quanta or the startup time for konqueror. Will i gain something making my gimp filters apply faster? yes. or extracting files from ark and compressing a divx quicker? yes. The real difference made by cflags optimization on "daily usage" is seen only (!) on time-consuming heavy processors based apps. Will not go in-deep. Not my interest. That said, my representative programs are the following: tar, bzip2, povray, ffmpeg, and sometime added konqueror in the bulk (yes, konqueror, running a complex ajax application).
- All programs were compiled on each machine with each group of flags and tested everytime in the same way. Yeah that was a whole weekend of testing.
- On the fastest machine i added some test taking exactly acovea's best flags.

Conclusions:
Best cflags: -march=athlon64 -O3 -pipe -fomit-frame-pointer -finline-functions
So let gcc do his job. If you need more pure processor speed, don't waist your time and buy a faster one.
No much else to say.

Some considerations (that's before somebody posts something already said or verified around):
- Latest versions of gcc do a very good job in optimization. Some flags made a noticable difference in the past 3.x serie, now that's over.
- Cflags are a bunch of a lot and more then hundred. Some of them taken alone are more pessimistic then bind togheter with others, and some of them are more harmful in some tasks then in other where - in contrast - can affect positively performance.
- Modern programs use a wide range of tasks, from floating point calculation to memory block manipulation, you cannot find a combination of flags that is best for everything.
- The -march=athlon64 flag is the first one you have to consider. Compared to this flag anything else is not relevant.
- A good programmed application can gain much more then any best found combination of compiler flags. Programmers: review slow routines.
- A difference of 20$ when buying a processor can make a 30% speed increase rendering an image with blender. Tweaking flags maybe 5%. You'll not see the difference surfing with konqueror neither downloading mails.
- Acovea flags give sometime an increase of 40% in performance on... acovea's test. That's because they are very specific tests performing very specific tasks at time. On daily applicaton they are worst. Anyways. That said remember that acovea is a good and sometimes very useful program (see ahead).
- if you have a very specific program that performs a very specific task that wastes a lot of time (let's say you are a researcher and you wrote a C program to solve field equations of GR in Riemann's manifolds using a sub-division approach) that can take days on your machine, then i suggest to take the heaviest functions and test them in acovea to find best performance cflags. You may really increase speed.

As last topic, lets nuke some legend:
- You will not gain from 64bit compiled programs on a modern 64bit athlon (or higher) system. False. You WILL gain from 64bit compiled programs. In some case even 20%.
- You will not gain from a dual core then a single core processor. Well.. there is something true in this statement if you use gcc. Applications aren't still optimized for dual processors/dual core. Some benchmarks found on the internet say you can actually have loss of performance in some cases. I don't have a direct experience for this (the single core i used is slower anyway then the dual core, so i can't compare the results) but my guess is that the processor wastes more time trying to divide threads and tasks then performing everything on a single core. Using icc seems to increase performances a lot. Yafray seems to me to gain much more then other programs compiled on the dual core machine.
- -Os (or -O) is better because apps load faster. Are you serious? Modern programs are made of a bunch of libraries, they load and perform while needed, and they usually don't exceed a few megs. The kwrite launcher is a few kilo. And its libraries too. They are loaded in sequence, and the startup of an application wastes more time initialiting and executing libraries then loading them. So that's just False in most of cases (of course i'm not considering old pc's running short of ram).
- -O1 -fomit-frame-pointer -finline-functions is comparable to -O2. False, and the difference is noticeable.
- -O2 is better then -O3. False, but the difference is often not noticeable.
- -fomit-frame-pointer AND -finline-functions are the first cflags to consider. True. The difference between -O2 and -O3 is kinda annihilated when adding the two flags to -O2, with a preference for the first one.
- -ffast-math or -funsafe-math-optimizations or -mfpmath=387 or any combination of the 3 compile faster code. I wonder how many post i read about this of people claiming they are absolutely sure about this (very common is to bundle the -funsafe-math-optimizations with the -mfpmath=387, there are guys out there that say they got an impressive 50% increase on some applications!)... WTF!!!! Did they really test it or they all read about somebody else who read about? This is absolutely false. on any amd64 system every floating-point-processor-stressing-task performs slower (and not only...). A lot slower.

--

I'll not put the results, i really have a dozen of hand written sheets on my desk, and i don't think i'll ever find time to waste to reorder and post them, so please dont' ask me for those.
For the most curious, i include a set of tests for povray performed on the fastest machine (other results are coherent). Same rendering scene for every test. Scene includes transparency and reflections, radiosity calculated. Pure -O1 and -O2 are omitted, they are not of interest as they are worst anyways. I just want to highlight the interesting and more discussed differences.

Flags -> Rendering time (the faster the better)
-Acovea's best common (stripped) -> 1:13
-Acovea's best positive (stripped) -> 1:11
-Any of acovea's best set tried -> always more then 1:17 (i got even a 1:21 in a case)
-Os -> 1:16
-O1 -fomit-frame-pointer -finline-functions -> 1:14
-O2 -fomit-frame-pointer -finline-functions -> 1:10
-O3 -> 1:09 (!)
-O3 -mfpmath=387 -> 1:12
-O3 -ffast-math -> 1:16
-O3 -ffast-math -mfpmath -> 1:17
-O3 -funsafe-math-optimizations -> 1:18
-O3 -funsafe-math-optimizations -mfpmath=387 -> 1:18
_________________
Think. Then think twice. Then, if you really need it, talk. But i'm sure you'll still say something stupid.
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2289
Location: near Augsburg, Germany

PostPosted: Fri Nov 23, 2007 8:02 pm    Post subject: Reply with quote

Simply No!
Back to top
View user's profile Send private message
DaggyStyle
Watchman
Watchman


Joined: 22 Mar 2006
Posts: 5910

PostPosted: Fri Nov 23, 2007 8:20 pm    Post subject: Reply with quote

acording to http://gentoo-wiki.com/Safe_Cflags, -fomit-frame-pointer disables 64 bit support, care to answer?
_________________
Only two things are infinite, the universe and human stupidity and I'm not sure about the former - Albert Einstein
Back to top
View user's profile Send private message
loftwyr
l33t
l33t


Joined: 29 Dec 2004
Posts: 970
Location: 43°38'23.62"N 79°27'8.60"W

PostPosted: Fri Nov 23, 2007 10:15 pm    Post subject: Reply with quote

Could you point out where it says it disables 64bit support? The only thing I read is that it's inlined on -Os to -O3.
_________________
My emerge --info
Have you run revdep-rebuild lately? It's in gentoolkit and it's worth a shot if things don't work well.
Celebrating 5 years of Gentoo-ing.
Back to top
View user's profile Send private message
s.hase
Apprentice
Apprentice


Joined: 19 Nov 2004
Posts: 293

PostPosted: Sat Nov 24, 2007 12:57 pm    Post subject: Reply with quote

DaggyStyle wrote:
acording to http://gentoo-wiki.com/Safe_Cflags, -fomit-frame-pointer disables 64 bit support, care to answer?

There is no such statement in the wiki:
Quote:

The flag -fomit-frame-pointer is enabled at -O1, -O2, -O3 and -Os on arches where it doesn't interfere with debugging, such as AMD64
Back to top
View user's profile Send private message
DaggyStyle
Watchman
Watchman


Joined: 22 Mar 2006
Posts: 5910

PostPosted: Sun Nov 25, 2007 2:14 pm    Post subject: Reply with quote

ok, my bad, all the 64 bit setups dont have it but the 32 bit setups for the same cpu has it, I've autoassumed it, anyway, what's the effect on the system?
_________________
Only two things are infinite, the universe and human stupidity and I'm not sure about the former - Albert Einstein
Back to top
View user's profile Send private message
Paapaa
l33t
l33t


Joined: 14 Aug 2005
Posts: 955
Location: Finland

PostPosted: Mon Nov 26, 2007 9:26 am    Post subject: Re: optimization flags, myths and truths for the real world Reply with quote

xoxo_davide wrote:
And, just a note: -fomit-frame-pointer and -finline-functions are inlined by default with -O3, so they are actually not usefull. I just put them along because sometimes ebuilds replace -O3 with -O2, and the hope is that the two flags are kept along with -O2. And -pipe is not a real cflags, that's just to speed up compile time. So the result is -O3.


-fomit-frame-pointer is always enabled at levels -O, -O2, -O3, -Os. So you can remove it safely. As said: this applies to x86_64 where it doesn't affect debuggability.

And as for -finline-functions and circumventing flag filtering:

Quote:
It's possible to circumvent -O filtering by redundantly listing the flags for a certain level, such as -O3, by doing things like:

Code:
CFLAGS="-O3 -finline-functions -funswitch-loops"


However, this is not a smart thing to do. CFLAGS are filtered for a reason! When flags are filtered, it means that it is unsafe to build a package with those flags. Clearly, it is not safe to compile your whole system with -O3 if some of the flags turned on by that level will cause problems with certain packages. Therefore, you shouldn't try to "outsmart" the developers who maintain those packages. Trust the developers. Flag filtering and replacing is done for your benefit! If an ebuild specifies alternative flags, then don't try to get around it.

You will most likely continue to run into problems when you build a package with unacceptable flags. When you report your troubles on Bugzilla, the flags you use in /etc/make.conf will be readily visible and you will be told to recompile without those flags. Save yourself the trouble of recompiling by not using redundant flags in the first place! Don't just automatically assume that you know better than the developers.


And finally about -O3:

Quote:
Compiling all your packages with -O3 will result in larger binaries that require more memory, and will significantly increase the odds of compilation failure or unexpected program behavior (including errors). Using -O3 is not recommended for gcc 4.x.


And come on, the difference between O2 and O3 was 1 sec. In practice that means they are equal. Where did I find all this information? Here:

http://www.gentoo.org/doc/en/gcc-optimization.xml
_________________
Paludis, the way packages are meant to be managed.
Back to top
View user's profile Send private message
aTan
Tux's lil' helper
Tux's lil' helper


Joined: 06 Jan 2007
Posts: 134
Location: Czech Republic (Ukraine)

PostPosted: Tue Nov 27, 2007 8:32 pm    Post subject: Re: optimization flags, myths and truths for the real world Reply with quote

xoxo_davide wrote:

- You will not gain from 64bit compiled programs on a modern 64bit athlon (or higher) system. False. You WILL gain from 64bit compiled programs. In some case even 20%.

Are there any tests? What are the arguments and facts about it on a desktop? I agree that it helps on e.g. a high loaded DB servers or something like that, but what applications make use of 64bit on a desktop (without encoding stuff)?
Back to top
View user's profile Send private message
squirrelfishfrog
n00b
n00b


Joined: 28 Nov 2007
Posts: 15

PostPosted: Wed Nov 28, 2007 1:00 pm    Post subject: sse/sse2 Reply with quote

I have some results and questions about sse flags,

I wrote a benchmark to test the usage of sse instructions on a xeon processor (gcc 4.2.0), and the results are weird.
The code constists of loops with a lot of floating point operations. So this is something sse2 is made for.

My flags were -O{n} -march=nocona -mtune=nocona
( with n=0,1,2,3; nocona worked and does support sse,sse2, also /proc/cpuinfo lists sse,sse2 as supported)

the gentoo optimization guide says that -msse and -msse2 are implied by correct -march but i included them explicitly.

I compiled the same code without those flags (no -march flag, same -O level) but from what i see in `gcc -v -Q example.c` the -msse(2) options are still enabled, among many others.

So the result was: no matter what -O level I set, the performances are equal between generic and correct -march/-msse2 compiled versions.

Also I read that -mfpmath=sse,387 activates an additional processing units for sse (you did not comment on that one), but that had no positive effect for me.
I tried -funroll-loops: no advantage for sse2 compiled code.

I would assume that those features (-msse, -msse2, -mmmx,...) should be deactivated with generic, for compatibility with older processors, so i hope i did sth. wrong, for it seems they are always on....

any comments?

by the way: do i misunderstand what gcc -v -Q lists under "options enabled:"? And by no effect I mean literally no effect not even a millisecond on average (i did some statistical error estimation stuff).

I decided to stick with CFLAGS="-O2 -march=appropriatecpu -mtune=appropriatecpu -pipe"

But still, for my own code, i would like to know what sse actually does (performancewise)....and if it really is always on if gcc thinks it should turn it on for your system...and such

i would be thankful for any hints on that
Back to top
View user's profile Send private message
red-wolf76
l33t
l33t


Joined: 13 Apr 2005
Posts: 714
Location: Rhein-Main Area

PostPosted: Wed Nov 28, 2007 2:08 pm    Post subject: Reply with quote

If you're using a sufficiently advanced toolchain, you might consider -march=native. I wouldn't recommend using both -march and -mtune at the same time. Seems rather pointless.
_________________
0mFg, G3nt00 r0X0r$ T3h B1g!1111 ;)

Use sane CFLAGS! If for no other reason, do it for the lulz!
Back to top
View user's profile Send private message
squirrelfishfrog
n00b
n00b


Joined: 28 Nov 2007
Posts: 15

PostPosted: Wed Nov 28, 2007 2:49 pm    Post subject: -mtune Reply with quote

yes,
-march=X implies -mtune=X,
they differ only in the generic option.
But it doesn't hurt. (or does it?)

native (thanks for that):

i recompiled with -march=native
then with -mtune=generic

both versions had the same runtime.

At work i have access to intels C compiler (version 9.1.045 and 10.0.023) so i tried iccs (9.1.045) optimization flags and there was a 10% difference (no -march version slightly worse than -march=pentium4 version) and since the faster one was as fast as the gcc compiled ones, I would assume that my assumption was right and that sse(2) is always on. Although it is possible that sse had nothing to do with it and that some other optimization is responsible for the 10% difference.

I will try that with some other procs as soon as I can.
Back to top
View user's profile Send private message
loftwyr
l33t
l33t


Joined: 29 Dec 2004
Posts: 970
Location: 43°38'23.62"N 79°27'8.60"W

PostPosted: Wed Nov 28, 2007 3:28 pm    Post subject: Reply with quote

Interestingly, I just found out on my X2 cpu, sse3 isn't enabled using -march=native. with gcc -v -Q -march=native, it passes -march=k8 -mtune=k8 and that does not include -msse3. My CPU does support sse3 so it should be enabled.

So much for -march=native.

*EDIT*
Seems it's true, and its fixed in 4.3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33312
_________________
My emerge --info
Have you run revdep-rebuild lately? It's in gentoolkit and it's worth a shot if things don't work well.
Celebrating 5 years of Gentoo-ing.
Back to top
View user's profile Send private message
timeBandit
Bodhisattva
Bodhisattva


Joined: 31 Dec 2004
Posts: 2719
Location: here, there or in transit

PostPosted: Wed Nov 28, 2007 4:11 pm    Post subject: Reply with quote

  1. http://funroll-loops.info/
  2. HOLY COW I'M TOTALLY GOING SO FAST OH F***
xoxo_davide wrote:
Please don't argue with me on those results, i'll not answer neither go in-deep anyway. If you want to know a bit more, read ahead, but then don't argue with me anyway. :P
If it brings you joy, by all means have fun, but it's largely wasted time.

You want optimized? :idea:: USE flags >> CFLAGS.
_________________
Plants are pithy, brooks tend to babble--I'm content to lie between them.
Super-short f.g.o checklist: Search first, strip comments, mark solved, help others.
Back to top
View user's profile Send private message
red-wolf76
l33t
l33t


Joined: 13 Apr 2005
Posts: 714
Location: Rhein-Main Area

PostPosted: Wed Nov 28, 2007 4:21 pm    Post subject: Reply with quote

loftwyr wrote:
Interestingly, I just found out on my X2 cpu, sse3 isn't enabled using -march=native. with gcc -v -Q -march=native, it passes -march=k8 -mtune=k8 and that does not include -msse3. My CPU does support sse3 so it should be enabled.

So much for -march=native.

*EDIT*
Seems it's true, and its fixed in 4.3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33312
Yeah, it's a new flag and probably there's going to be some working out to do. I read somewhere that it uses CPUID for working its magic.

Glad to see it fixed. In such cases, you can always add your -msse3 flag as a redundancy. Myself, I don't own such fancy processors (yet).
_________________
0mFg, G3nt00 r0X0r$ T3h B1g!1111 ;)

Use sane CFLAGS! If for no other reason, do it for the lulz!
Back to top
View user's profile Send private message
squirrelfishfrog
n00b
n00b


Joined: 28 Nov 2007
Posts: 15

PostPosted: Wed Nov 28, 2007 5:00 pm    Post subject: Reply with quote

timeBandit wrote:
  1. http://funroll-loops.info/
  2. HOLY COW I'M TOTALLY GOING SO FAST OH F***
xoxo_davide wrote:
Please don't argue with me on those results, i'll not answer neither go in-deep anyway. If you want to know a bit more, read ahead, but then don't argue with me anyway. :P
If it brings you joy, by all means have fun, but it's largely wasted time.

You want optimized? :idea:: USE flags >> CFLAGS.


Like xoxo_davide already wrote: some applications do run a long time and are using a lot of floating point math: conversions (video stuff), encoding, fourier transforms (which are used in compression algorithms) and maybe your own programs. So maybe you wont appreciate 10% gain with kwrite but if the program is running an hour you would.

so obviously in that case USE !>> CFLAGS

But the quotes on 1 are still hilarious.

Besides, people who are doing physics and such have programs that run weeks..... :roll:.
So knowing more about gcc stuff is never a bad idea.

agreed. (on global CFLAGS)
||
vv


Last edited by squirrelfishfrog on Wed Nov 28, 2007 6:01 pm; edited 1 time in total
Back to top
View user's profile Send private message
red-wolf76
l33t
l33t


Joined: 13 Apr 2005
Posts: 714
Location: Rhein-Main Area

PostPosted: Wed Nov 28, 2007 5:20 pm    Post subject: Reply with quote

Physics number crunching may not be the poster child, but there are actually applications that benefit quite a bit from -ffast-math. I remember hearing something about video apps. But certainly, that wouldn't warrant a global setting, more likely, the ebuilds that benefit from it should be written to include it of themselves.
_________________
0mFg, G3nt00 r0X0r$ T3h B1g!1111 ;)

Use sane CFLAGS! If for no other reason, do it for the lulz!
Back to top
View user's profile Send private message
Paapaa
l33t
l33t


Joined: 14 Aug 2005
Posts: 955
Location: Finland

PostPosted: Wed Nov 28, 2007 6:23 pm    Post subject: Re: sse/sse2 Reply with quote

squirrelfishfrog wrote:
any comments?


1. There is no harm using redundant "-mtune" but it still is redundant and useless.
2. Use "diff" to see if gcc produces identical binaries or not. There is no point benchmarking the same binary.
3. "-mfpmath=sse is the default choice for x86-64 compiler." "mfpmath: Generate floating point arithmetics for selected unit."
4. "msse, msse2: These switches enable or disable the use of instructions in the MMX, SSE, SSE2 or 3DNow! extended instruction sets."
5. You can try disabling sse instruction support with "-mno-sse" or "-mno-sse2" to see how it affects. Disable sse arithmetics with "-mfpmath=387".
6. See GCC docs and especially source code for more information about defaults of various march settings.
7. Oplitizimitasions are overrated on these forums. Just use -O2 and be happy - everything that gives significant and safe speedups are already included.
_________________
Paludis, the way packages are meant to be managed.
Back to top
View user's profile Send private message
xoxo_davide
n00b
n00b


Joined: 24 Sep 2004
Posts: 37

PostPosted: Fri Nov 30, 2007 2:43 pm    Post subject: Reply with quote

Quote:
Physics number crunching may not be the poster child, but there are actually applications that benefit quite a bit from -ffast-math. I remember hearing something about video apps. But certainly, that wouldn't warrant a global setting, more likely, the ebuilds that benefit from it should be written to include it of themselves

Actually... that's right. I voluntarily omitted one other app i tested. But i still have to point out that if you want an application to benefit from -ffast-math, you have to code C functions with the flag in mind. And, this, sorry to say, is done in a very very few applications. So, after all, that should not be inlined by ebuilds, but by developers in the configure of the application. Still agree that gentoo ebuilds should include safe flags that give some benefit to apps, mostly depending on the architecture.

One example for all of what i call 'right implemented'.
Yafray (rendering app) has -ffast-math optimized functions, and developers inline the flag during configuration. In addition, gentoo developers added the -fsigned-char flag. And again, i like to believe in facts, so here are the results.
Machine: Athlon64 dual core, 2 Gigs of Ram.
Test: yafray rendering a multi-mesh scene with texture, transparencies, reflections. Osa=8, ray-trace=on. The difference are minimal in this case, so i run the rendering twice every time, to be sure.

-O2 -ffast-math -fsigned-char -> 1.13.99 / 1.13.88
-O2 -ffast-math -fsigned-char -finline-functions -> 1.13.80 / 1.13.50
-O3 -ffast-math -> 1.15.30 / 1.15.06
-O3 -fsigned-char -> 1.15.40 / 1.15.72
-O3 -ffast-math -fsigned-char -> 1.13.16 / 1.13.47 (!)

Some considerations:
- -ffast-math has to be bundled with -fsigned-char if we want some benefits. That's because of the amd64 architecture.
- -O3 is still better then -O2, but the difference is kinda negligible (note: configure tries to set -O3, but standard ebuild substitutes -O3 with your flags).
- -finline-functions still gives a slight improvement on -02

Conclusions:
And again and for all: best make.conf flags: -O3 (configure will add -ffast-math, gentoo amd64 ebuild will add -fsigned-char, -finline-functions and -fomit-frame-pointer are already inlined by -O3).
And again: Programmers: review slow routines.
_________________
Think. Then think twice. Then, if you really need it, talk. But i'm sure you'll still say something stupid.
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2289
Location: near Augsburg, Germany

PostPosted: Fri Nov 30, 2007 4:53 pm    Post subject: Reply with quote

Acovea is not working properly.

The best result is -O2 and some additional flags:

CFLAGS:
Code:

CFLAGS="-march=k8"
CFLAGS="${CFLAGS} -O2"
CFLAGS="${CFLAGS} -combine"
CFLAGS="${CFLAGS} -falign-functions=0"
CFLAGS="${CFLAGS} -falign-jumps=0"
CFLAGS="${CFLAGS} -falign-labels=0"
CFLAGS="${CFLAGS} -falign-loops=0"
CFLAGS="${CFLAGS} -ffunction-cse"
CFLAGS="${CFLAGS} -fgcse-after-reload"
CFLAGS="${CFLAGS} -fgcse-lm"
CFLAGS="${CFLAGS} -fkeep-static-consts"
CFLAGS="${CFLAGS} -fmerge-constants"
CFLAGS="${CFLAGS} -fno-ident"
CFLAGS="${CFLAGS} -fprefetch-loop-arrays"
CFLAGS="${CFLAGS} -frename-registers"
CFLAGS="${CFLAGS} -fweb"
CFLAGS="${CFLAGS} -msse2"
CFLAGS="${CFLAGS} -m80387"
CFLAGS="${CFLAGS} -pipe"


CPPFLAGS:
Code:
CPPFLAGS="-Wall"


and LDFLAGS:
Code:
LDFLAGS="-Wl,-O4"
LDFLAGS="${LDFLAGS} -Wl,--as-needed"
LDFLAGS="${LDFLAGS} -Wl,--enable-new-dtags"
LDFLAGS="${LDFLAGS} -Wl,--hash-style=both"
LDFLAGS="${LDFLAGS} -Wl,--sort-common"
LDFLAGS="${LDFLAGS} -Wl,-S"
LDFLAGS="${LDFLAGS} -Wl,-z,now"


for an AMD Athlon64 CPU.


Last edited by Keruskerfuerst on Thu Dec 06, 2007 6:55 am; edited 3 times in total
Back to top
View user's profile Send private message
squirrelfishfrog
n00b
n00b


Joined: 28 Nov 2007
Posts: 15

PostPosted: Mon Dec 03, 2007 3:22 pm    Post subject: Re: sse/sse2 Reply with quote

Paapaa wrote:


1. There is no harm using redundant "-mtune" but it still is redundant and useless.
2. Use "diff" to see if gcc produces identical binaries or not. There is no point benchmarking the same binary.
3. "-mfpmath=sse is the default choice for x86-64 compiler." "mfpmath: Generate floating point arithmetics for selected unit."
4. "msse, msse2: These switches enable or disable the use of instructions in the MMX, SSE, SSE2 or 3DNow! extended instruction sets."
5. You can try disabling sse instruction support with "-mno-sse" or "-mno-sse2" to see how it affects. Disable sse arithmetics with "-mfpmath=387".
6. See GCC docs and especially source code for more information about defaults of various march settings.
7. Oplitizimitasions are overrated on these forums. Just use -O2 and be happy - everything that gives significant and safe speedups are already included.


1 k.
2. i checked that already, they did differ (even in size, slightly), but not in performance.
3-4. its what the manuals state....so i knew that, phew.
5. i read that too, but didn't try it because i was convinced that sse(2) shouldn't be on as default. but it is and the program doesn't even compile without sse (some stdlib functions require those apparently). so thanks for that.
6. i don't think that im that desperate... :) reading code is bad for your eyes. see 7
7. totally agreed.
Back to top
View user's profile Send private message
JeliJami
Veteran
Veteran


Joined: 17 Jan 2006
Posts: 1086
Location: Belgium

PostPosted: Mon Dec 03, 2007 3:39 pm    Post subject: Reply with quote

timeBandit wrote:
  1. http://funroll-loops.info/
  2. HOLY COW I'M TOTALLY GOING SO FAST OH F***

don't forget Howto: Gentoo Ricing 183%
_________________
Unanswered Post Initiative | Search | FAQ
Former username: davjel
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2289
Location: near Augsburg, Germany

PostPosted: Wed Dec 05, 2007 8:48 pm    Post subject: Reply with quote

Quote:
fomit-frame-pointer is always enabled at levels -O, -O2, -O3, -Os. So you can remove it safely. As said: this applies to x86_64 where it doesn't affect debuggability.


Prove:

#ifdef CAN_DEBUG_WITHOUT_FP
flag_omit_frame_pointer = 1;
Back to top
View user's profile Send private message
MP_
n00b
n00b


Joined: 10 Nov 2003
Posts: 57
Location: Budapest, Hungary

PostPosted: Thu Dec 06, 2007 10:32 am    Post subject: Reply with quote

timeBandit wrote:
You want optimized? :idea:: USE flags >> CFLAGS.


WELL WROTE CODE >> USE flags
_________________
MP
Back to top
View user's profile Send private message
timeBandit
Bodhisattva
Bodhisattva


Joined: 31 Dec 2004
Posts: 2719
Location: here, there or in transit

PostPosted: Thu Dec 06, 2007 3:43 pm    Post subject: Reply with quote

MP_ wrote:
timeBandit wrote:
You want optimized? :idea:: USE flags >> CFLAGS.

WELL WROTE CODE >> USE flags
I see I need to clarify. :)

For a Gentoo system, the most effective overall optimization is to build only what you need. Code that never executes is the fastest of all. :wink: You can optimaxilatize the rest to your heart's content but the biggest gains come from simply not building things you won't use. That's why Gentoo works well both on spankin' new multi-core monsters and on creaky 12-year-old Pentiums with less RAM than some mobile phones.
_________________
Plants are pithy, brooks tend to babble--I'm content to lie between them.
Super-short f.g.o checklist: Search first, strip comments, mark solved, help others.
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2289
Location: near Augsburg, Germany

PostPosted: Thu Dec 06, 2007 7:35 pm    Post subject: Reply with quote

I have made some experiments with ASFLAGS, but i did not improve execution speed.
ASFLAGS="--64"
ASFLAGS="-mtune=aaa"
ASFLAGS="-march=aaa"


Last edited by Keruskerfuerst on Fri Dec 07, 2007 9:41 am; edited 1 time in total
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64 All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum