Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Basic Explaine GCC Optimization
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
mudrii
l33t
l33t


Joined: 26 Jun 2003
Posts: 789
Location: Singapore

PostPosted: Thu Apr 14, 2005 4:12 am    Post subject: More about GCC Machine-Specific Compiler Options i386 x86-64 Reply with quote

All credit go to Osborne GCC Complete Reference
For book price and more informatios please check this link
Introduction to GCC
GCC is a product of the GNU Project. This project began in 1984 with the goal in mind of developing a complete UNIX-like operating system as free software. Like any project of this size, the GNU roject has taken some twists and turns, but the goal has been achieved.
Compilers can be compared in terms of speed of compilation, speed of the generated code, and the size of the generated code. It’s hard to measure much else. Some numbers can be produced, but it’s difficult to attach much meaning to them. For example, a count of the number of source files (makefiles, configuration files, header files, executable code, and so on) shows that there are well over 15,000 files of various types. Compiling the source files into object files, libraries, and executable programs increases the count by several thousand more. Counting the lines of code—the number of lines of text in the 15,000+ files—produces a number greater than 3,700,000. By any criteria you want to use, that’s a large program
Machine-Specific Compiler Options Intel 386 and AMD x86-64 Options
Some, but not all, of the existing ports have special command-line options that tell the compiler to produce code that will further refine the generated to code to match a specific hardware mode or runtime configuration. The following sections list the -m options available for the ports that have defined them.
Most of the options begin with -m, but there are a few exceptions. Some are used to specify that code be generated for a specific CPU within a family of CPUs, while others can be used to generate code that will take advantage of some specific hardware feature to better merge the characteristics of your program with the hardware. Some of the options are needed to make the generated code fit with a particular hardware configuration.

The following options are defined for the i386 and x86-64 family of computers.
-m128bit-long-double
Specifies the size of the long double type to be 128 bits (16 bytes). The i386 application binary interface specifies the size to be 12 bytes, while newer architectures (Pentium and later) prefer long double aligned on an 8- or 16-byte boundary. This is impossible to achieve with 12-byte long doubles being accessed as an array. If you specify this option, the structures and arrays containing long double data will change size. Also, the function calling convention for functions using long double will be modified.
-m32
On AMD x86-64 processors in a 64-bit environment, this option sets int, long, and pointer data to 32 bits and generates code that runs on any i386 system.
-m386
The same as -mcpu=i386. This form of the option is deprecated.
-m3dnow
Enables the use of built-in functions that allow direct access to the 3Dnow extensions.
The usage can be disallowed with -mno-3dnow.
-m486
The same as -mcpu=i486. This form of the option is deprecated.
-m64
On AMD x86-64 processors in a 64-bit environment, this option sets int to 32 bits, sets long and pointer data to 64 bits, and generates code specifically for AMD’s x86-64 architecture.
-m96bit-long-double
Specifies that the size of long double data items be 96 bits (12 bytes), as required by the i386 application binary interface. This is the default.
-maccumulate-outgoing-args
Specifies that the maximum amount of space required for outgoing arguments will be computed in the function prolog. This is faster on most modern CPUs because of reduced dependencies, improved scheduling, and reduced stack usage when the preferred stack boundary is not equal to 2. The drawback is an increase in code size. Setting this option also sets -mno-push-args.
-malign-double
Specifies that the compiler align double, long double, and long long variables on a two-word boundary. Specifying -mno-align-double aligns them on a one-word boundary.
Aligning double variables on a two-word boundary will produce code that runs somewhat faster on a Pentium at the expense of the program being larger.
The -malign-double option causes structures containing the preceding types to be aligned differently than the published application binary interface specifications for the 386.
-march=architecture
Generates instructions for the machine architecture. The choices for architecture are the same as for type in the -mcpu option. Specifying -march implies -mcpu for the same type.
-masm=dialect
Outputs assembly language instructions using the specified dialect. Valid selections for dialect are intel and att. The default is att.
-mcpu=type
Tunes the generated code to everything applicable to the specified type, except for the ABI and the set of available instructions. The valid choices for type are i386, i486, i586, i686, pentium, pentium-mmx, pentiumpro, pentium2, pentium3, pentium4, k6, k6-2, k6-3, athlon, athlon-tbird, athlon-4, athlon-xp, and athlon-mp. The type i586 is equivalent to pentium. The type i686 is equivalent to pentiumpro. The k6 and athlon types are the AMD chips. Although selecting a specific CPU will cause things to be scheduled appropriately for that particular chip, the compiler will not generate any code that does not run on the i386 without the -march option being specified.
-mfpmath=unit
Generates floating-point instructions for the selected hardware unit. Specifying unit as 387 uses the standard 387 floating-point coprocessor, present in the majority of chips and emulated otherwise. Code compiled with this option will run almost everywhere. The temporary results are computed with 80-bit precision.
Specifying unit as sse uses scalar floating-point instructions, present in the SSE instruction set. This instruction set is supported by Pentium3 and newer chips as well as in the AMD line by the Athlon-4, Athlon-xp, and Athlon-mp chips. The earlier version of the SSE instruction set supports only single-precision operations, thus the double- and extended-precision operations are still performed using 387. The newer version, present only in Pentium4 and the AMD x86-64 chips, supports double-precision operations.
When specifying unit as i387, you must also specify a -march, -msse, or -msse2 option to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default. The resulting code should be considerably faster (in most cases) and avoid the numerical instability problems of 387 code. However, this option may break some existing code that expects temporaries to be 80-bit values.
Specifying unit as sse,387 attempts to utilize both instruction sets at once. This effectively doubles the number of available registers on chips with separate execution units for 387 and SSE. This option is experimental, because gcc register allocation does not model separate functional units well.
-mieee-fp
Specifies that the compiler uses IEEE floating-point comparisons. These comparisons correctly handle the case in which the result of a comparison is unordered. The use of IEEE floating-point comparisons can be suppressed by specifying -mno-ieee-fp.
-minline-all-stringops
All string operations are inlined. The default is that string operations are inlined only when the destination is known to be aligned at least to a 4-byte boundary. This option enables more inlining, which increases code size, but may improve performance of the code that depends on fast memcpy(), strlen(), and memset() for short lengths.
-mmmx
Enables the use of built-in functions that allow direct access to the MMX extensions.
This usage can be disallowed with -mno-mmx.
-mno-align-stringops
Does not align the destination of inlined string operations. This option reduces code size and improves performance in cases where the destination is already aligned and the compiler doesn’t know it.
-mno-fancy-math-387
Some 387 emulators do not support the sin, cos, and sqrt instructions for the 387. This option avoids generating those instructions. It has no effect unless you also specify the -funsafe-math-optimizations option.
This option is the default on FreeBSD, OpenBSD, and NetBSD. It is ignored when -march indicates that the target CPU will always have an FPU and the instruction will not need emulation.
-mno-fp-ret-in-387
Specifies not to use the FPU registers for return values of functions. The usual calling convention has functions return values of types float and double in an FPU register, even if there is no FPU. The idea is that the operating system should emulate an FPU. Specifying this option will cause the values to be
returned in ordinary CPU registers instead.
-mno-red-zone
On AMD x86-64 processors in a 64-bit environment, this option suppresses the use of a so-called red zone for x86-64 code. The red zone is mandated by the x86-64 ABI and is a 128-byte area beyond the location of the stack pointer that will not be modified by signal or interrupt handlers and therefore can be used for temporary data without adjusting the stack pointer. This option disables the red zone.
-momit-leaf-frame-pointer
Does not retain the frame pointer in a register for leaf functions. This avoids the instructions to save, set up, and restore frame pointers and makes an extra register available inside the leaf functions.
The option -fomit-frame-pointer can be used to remove the frame pointer for all functions, but this does make debugging more difficult.
-mpentium
The same as -mcpu=pentium. This form of the option is deprecated.
-mpentiumpro
The same as -mcpu=pentiumpro. This form of the option is deprecated.
-mpreferred-stack-boundary=number
Attempts to keep the stack boundary aligned to a 2 raised to number byte boundary. The default for number is 4 (16 bytes or 128 bits). Optimizing for code size by specifying -Os sets the minimum to the correct alignment (four bytes for x86 and eight bytes for x86-64). On Pentium and Pentium Pro, double and long double values should be aligned to an 8-byte boundary to prevent the code from running slower. On the Pentium III, the Streaming SIMD Extension (SSE) data type __m128 suffers similar speed penalties if it is not aligned on a 16-byte boundary. To ensure proper alignment of values on the stack, the stack boundary must be aligned to the boundary required by any value stored on the stack. Also, every function must be generated so that it keeps the stack aligned. This means calling a function compiled with a higher preferred stack boundary from a function compiled with a lower preferred stack boundary will most likely misalign the stack. It is recommended that libraries that use callbacks always use the default setting. This extra alignment does consume stack space and generally increases code size. For code that is sensitive to stack space usage, such as embedded systems and operating system kernels, you may want to reduce the preferred alignment to -mpreferredstack- boundary=2.
-mpush-args
Uses push operations to store outgoing parameters. This method is shorter and usually equally as fast as the method using sub/mov operations, and it’s enabled by default. The default can be overridden with -mno-push-args and, in some cases, disabling it may improve performance because of improved scheduling and reduced dependencies.
-mregparm=number
Specifies the number of registers used to pass integer arguments. By default, no registers are used to pass arguments. The largest value for number is 3. You can control this behavior for a specific function by using the function attribute regparm. When using this option with number being a nonzero value, you must build all modules, including libraries, with the same value.
-mrtd
Uses a different function calling convention, in which functions that take a fixed number of arguments return with the ret num instruction, which pops their arguments while returning. This saves one instruction in the caller because there is no need to pop the arguments after the return. You can specify that an individual function is called with this calling convention with the function attribute stdcall. You can also override the -mrtd option by using the function attribute cdecl. The use of this calling convention is incompatible with the one normally used on UNIX, so you cannot use it if you need to call libraries compiled with the UNIX compiler. Also, you must provide function prototypes for all functions that take variable numbers of arguments; otherwise, incorrect code will be generated for calls to those functions. Incorrect code will result if you call a function with too many arguments. Normally, extra arguments are harmlessly ignored.
-msoft-float
Generates code containing library calls for floating-point operations. The libraries are not part of GCC. The libraries of the target computer’s C compiler can be used, but this can’t be done directly in cross-compilation. It will be necessary for you to provide your own libraries for a cross compiler.
On machines where a function returns floating-point results in the 80387 register stack, some floating-point opcodes may be emitted even when -msoft-float is specified.
-msse
Enables the use of built-in functions that allow direct access to the SSE extensions.
The usage can be disallowed with -mno-sse.
-msse2
Enables the use of built-in functions that allow direct access to the SSE2 extensions.
The usage can be disallowed with -mno-sse2.
-mthreads
Supports thread-safe exception handling for Mingw32. Code that relies on thread-safe exception handling must compile and link all code with the -mthreads option. Setting this option defines -D_MT. When linking, it includes a special thread helper library with -lmingwthrd that cleans up per-thread exception handling data.
-msvr3-shlib
Specifies that the compiler place uninitialized local variables into the bss data segment. To specify that they be placed in the data data segment, use -mno-svr3-shlib. These options are meaningful only on System V Release 3.
For more detailed information check GCC i386-and-x86_64-Options
_________________
www.gentoo.ro
Back to top
View user's profile Send private message
mudrii
l33t
l33t


Joined: 26 Jun 2003
Posts: 789
Location: Singapore

PostPosted: Thu Apr 14, 2005 5:02 am    Post subject: Basic Explaine GCC Optimization Reply with quote

-Olevel
Specifies the level of optimization to be applied to the code generated by the compiler. There is always a trade-off between optimizing for the size of the code or for speed of execution. The default is -O0 for no optimization.
If no optimization level is specified, the compiler runs to produce code that matches the structure of the input source. Optimization not only requires more processing, it requires much more memory. Compiling without optimization has the double advantage of shortening the compile time (optimization can take much longer), and the code produced can be tracked easily in a debugger. Both of these actions are ideal for the software development process. You can use the debugger on code that has been optimized, but some of the output code may be rearranged making it much more difficult to follow.
This option can be written --optimize.

Level and Description
-O The compiler attempts to reduce both code size and execution time, but not to make modifications that would cause difficulties with debugging.
Turns on the options -fno_optimize_size, -fdefer_pop, -fthread_jumps, -jguess_branch_prob, -cprop-registers, and -fdelayed_branch. The -fomit_frame_pointer flag is set only if the debugger is able to work without it on this platform.
-O0 The default. Disables all optimization. Turns off all size optimization and sets -fno-merge-constants.
-O1 The same as -O.
-O2 This level turns on all optimizations that do not involve size and speed trade-offs. In addition to the options turned on for -O, this level turns on -foptimize-sibling-calls,-fcse-follow-jumps,
-fcse-skip-blocks, -fgcse, -fexpensive-optimizations, -fstrength-reduce, -frerun-cse-after-loop,
-frerun-loop-opt, -fcaller-saves, -fforce-mem, -fpeephole2, -fschedule-insns, -fschedule-insn-after-reload, -fregmove, -fstruct-aliasing, -fdelete-null-pointer-checks
, and -freorder-blocks. This level does no loop unrolling, inlining, nor register renaming
-O3 In addition to the options turned on for –O2, this level turns on -finline-functions and -frename-registers.
-Os Optimizes for size. All of the -O2 options flags are set. The -falign-loops, -falign-jumps, -falign-labels, and -falign-functions are all set to 1, which prevents any space
being inserted for alignment.

Explain some Optional Optimization

-fomit-frame-pointer
Don’t store the frame pointer in a register for functions that don’t need one, thus omitting the code to store and retrieve the address as well as making another register available for general use. This flag is automatically set for all levels of -O optimization, but only if the debugger can be run without a frame pointer. If the debugger cannot be run with this setting you will have to set it explicitly. Some platforms have no frame pointer and this flag will have no effect. The default is -fno-omit-frame-pointer.
-funroll-loops
If it can be determined at compile time that the number of iterations is small enough, and if the number of instructions inside the loop is small enough, the loop is unrolled by removing the loop and duplicating the instructions so they will be executed the correct number of times. A loop is determined to be small enough if the number of insns in the loop multiplied by the number of iterations is less than a constant (currently set to 100). This option always sets both -fstrength-reduce and -frerun-cse-after-loop.
-pipe
Use pipes instead of intermediate files to communicate the output from one phase of the compiler to the input of another. This could fail if the local assembler is incapable of reading input from a pipe.
This option can be written --pipe.
-fforce-addr
Address must be copied into registers to have arithmetic performed on them. This improves the generated code because addresses needed will often have been previously loaded into a register and do not need to be loaded again. The default is -fno-force-addr.
-ffast-math
Certain mathematical calculations are made faster by violating some of the ISO and IEEE rules. For example, with this option set it is assumed that no negative values are passed to sqrt() and that all floating point values are valid.
Setting this option causes the preprocessor macro __FAST_MATH__ to be defined and also sets -fno-math-errno, -funsafe-math-optimizations, and -fno-trapping-math. Setting -fno-fast-math will also set -fmath-errno.
More detail information check GCC Optimize-Options
_________________
www.gentoo.ro


Last edited by mudrii on Thu Apr 14, 2005 5:18 am; edited 1 time in total
Back to top
View user's profile Send private message
moocha
Watchman
Watchman


Joined: 21 Oct 2003
Posts: 5722

PostPosted: Thu Apr 14, 2005 5:08 am    Post subject: Reply with quote

Sorry, but if it's still from that book you wasted your money. This is simply a copy of the GCC manual you can get via
Code:
info gcc
or online at http://gcc.gnu.org.
_________________
Military Commissions Act of 2006: http://tinyurl.com/jrcto

"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety."
-- attributed to Benjamin Franklin
Back to top
View user's profile Send private message
inode77
Veteran
Veteran


Joined: 20 Jan 2004
Posts: 1303
Location: Heart of Europe

PostPosted: Thu Apr 14, 2005 8:56 am    Post subject: Reply with quote

That's OK but all available online from the gcc manual:
http://gcc.gnu.org/onlinedocs/gcc-3.4.3/gcc/i386-and-x86_002d64-Options.html#i386-and-x86_002d64-Options
Back to top
View user's profile Send private message
Bob P
Advocate
Advocate


Joined: 20 Oct 2004
Posts: 3355
Location: Jackass! Development Labs

PostPosted: Thu Apr 14, 2005 9:20 am    Post subject: Reply with quote

on the bright side, it takes less time to bookmark this thread or your hyperlink to the GCC man pages than it does to type "man gcc" and press PgDn a couple of hundred times...:?
_________________
.
Stage 1/3 | Jackass! | Rockhopper! | Thanks | Google Sucks
Back to top
View user's profile Send private message
madmango
Guru
Guru


Joined: 15 Jul 2003
Posts: 507
Location: PA, USA

PostPosted: Fri Apr 15, 2005 12:24 am    Post subject: Reply with quote

You're also rather violating some copyright laws. Did you obtain written permission to reproduce sections of the book here?
_________________
word.
Back to top
View user's profile Send private message
wjholden
l33t
l33t


Joined: 01 Mar 2004
Posts: 826
Location: Augusta, GA

PostPosted: Fri Apr 15, 2005 2:44 am    Post subject: Reply with quote

madmango wrote:
You're also rather violating some copyright laws. Did you obtain written permission to reproduce sections of the book here?

Yeah madmango, stop downloading manpages since that's a copyright violation, dude.

Quote:
Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.

^ http://gcc.gnu.org/onlinedocs/
Back to top
View user's profile Send private message
Athas
Guru
Guru


Joined: 04 Sep 2003
Posts: 394
Location: Brøndby, Denmark

PostPosted: Fri Apr 15, 2005 7:51 am    Post subject: Reply with quote

Bob P wrote:
on the bright side, it takes less time to bookmark this thread or your hyperlink to the GCC man pages than it does to type "man gcc" and press PgDn a couple of hundred times...:?


It's easy to find in the infopages. :wink:
_________________
Emacs-optimized danish console keymap - My .emacs
Climacs - next generation Emacs.
Back to top
View user's profile Send private message
psyqil
Advocate
Advocate


Joined: 26 May 2003
Posts: 2767

PostPosted: Fri Apr 15, 2005 2:28 pm    Post subject: Reply with quote

And before anyone says that info pages suck: emerge pinfo!
Back to top
View user's profile Send private message
Drooling Iguana
Tux's lil' helper
Tux's lil' helper


Joined: 07 Apr 2004
Posts: 94
Location: Sector ZZ9 Plural Z Alpha

PostPosted: Fri Apr 15, 2005 4:11 pm    Post subject: Reply with quote

destuxor wrote:
madmango wrote:
You're also rather violating some copyright laws. Did you obtain written permission to reproduce sections of the book here?

Yeah madmango, stop downloading manpages since that's a copyright violation, dude.

Quote:
Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.

^ http://gcc.gnu.org/onlinedocs/

Mudrii didn't copy and/or distribute the entire article verbatim, and he didn't preserve the notice.

On another note, would "-O3 -march=athlon-xp -mcpu=athlon-xp -pipe -m3dnow -m128bit-long-double -mfpmath=sse -mmmx -msse -msse2" be a reasonable set of CFLAGS for an Athlon-XP system?
Back to top
View user's profile Send private message
moocha
Watchman
Watchman


Joined: 21 Oct 2003
Posts: 5722

PostPosted: Fri Apr 15, 2005 4:44 pm    Post subject: Reply with quote

Drooling Iguana wrote:
On another note, would "-O3 -march=athlon-xp -mcpu=athlon-xp -pipe -m3dnow -m128bit-long-double -mfpmath=sse -mmmx -msse -msse2" be a reasonable set of CFLAGS for an Athlon-XP system?
I don't think this thread is the place for CFLAGS advice... There are a ton of other threads on that, among which:
_________________
Military Commissions Act of 2006: http://tinyurl.com/jrcto

"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety."
-- attributed to Benjamin Franklin
Back to top
View user's profile Send private message
spb
Retired Dev
Retired Dev


Joined: 02 Jan 2004
Posts: 2135
Location: Cambridge, UK

PostPosted: Fri Apr 15, 2005 6:00 pm    Post subject: Reply with quote

Drooling Iguana wrote:
On another note, would "-O3 -march=athlon-xp -mcpu=athlon-xp -pipe -m3dnow -m128bit-long-double -mfpmath=sse -mmmx -msse -msse2" be a reasonable set of CFLAGS for an Athlon-XP system?
No.
Back to top
View user's profile Send private message
moocha
Watchman
Watchman


Joined: 21 Oct 2003
Posts: 5722

PostPosted: Fri Apr 15, 2005 6:14 pm    Post subject: Reply with quote

spb wrote:
No.
Heh :). Answered in the second thread I linked to.
_________________
Military Commissions Act of 2006: http://tinyurl.com/jrcto

"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety."
-- attributed to Benjamin Franklin
Back to top
View user's profile Send private message
lightvhawk0
Guru
Guru


Joined: 07 Nov 2003
Posts: 388

PostPosted: Sat Apr 16, 2005 7:17 pm    Post subject: Reply with quote

spb wrote:
Drooling Iguana wrote:
On another note, would "-O3 -march=athlon-xp -mcpu=athlon-xp -pipe -m3dnow -m128bit-long-double -mfpmath=sse -mmmx -msse -msse2" be a reasonable set of CFLAGS for an Athlon-XP system?
No.


-O2 -march=athlon-xp -pipe Should be fine.

note: -m3dnow -mmmx -msse are already set with the march flag
_________________
If God has made us in his image, we have returned him the favor. - Voltaire
Back to top
View user's profile Send private message
fuji
Tux's lil' helper
Tux's lil' helper


Joined: 26 Apr 2002
Posts: 111

PostPosted: Sun Apr 17, 2005 3:10 am    Post subject: Reply with quote

Bob P wrote:
on the bright side, it takes less time to bookmark this thread or your hyperlink to the GCC man pages than it does to type "man gcc" and press PgDn a couple of hundred times...:?

man:/gcc (and info:/gcc) shows a nicely formatted display in konqueror (with links!) if you have KDE.
_________________
Came for the hype, stayed for Portage.
Back to top
View user's profile Send private message
superstoned
Guru
Guru


Joined: 17 Dec 2004
Posts: 432

PostPosted: Sun Apr 24, 2005 10:59 am    Post subject: Reply with quote

fuji wrote:
Bob P wrote:
on the bright side, it takes less time to bookmark this thread or your hyperlink to the GCC man pages than it does to type "man gcc" and press PgDn a couple of hundred times...:?

man:/gcc (and info:/gcc) shows a nicely formatted display in konqueror (with links!) if you have KDE.

you beat me...
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum