- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
Useful Compiler Options On x86: Difference between revisions
From HLRS Platforms
Jump to navigationJump to search
(added cray note) |
(added -combine for gcc interprocedural optimization) |
||
Line 38: | Line 38: | ||
|- | |- | ||
| Enable interprocedural optimization over file boundaries | | Enable interprocedural optimization over file boundaries | ||
| -fwhole-program | | -fwhole-program -combine | ||
| -ipo (included -fast) | | -ipo (included -fast) | ||
| -Mipa=fast or -Mipa=fast,inline | | -Mipa=fast or -Mipa=fast,inline |
Revision as of 10:58, 18 May 2011
Introduction
The following options are helpful to improve performance and debug performance problems when compiling code on x86 processors using GCC, the Intel or PGI C or Fortran compilers.
Optimization Switches
Description | GCC Option | ICC Option | PGCC Option |
---|---|---|---|
Enable aggressive optimization features in general | -O3 | -fast | -fast |
Enable relaxed floating pointing builtin functions | -ffast-math | -fp-model fast | -Mfprelaxed |
Round demormalized FP values to zero | included in -ffast-math | -ftz | -Mdaz |
Assume associativity of floating point operations | -fno-signed-zeros -fno-trapping-math -fassociative-math | –Mvect=assoc | |
Assume C ansi aliasing rules are not violated | -fstrict-aliasing (included in -O2 and -O3) | -ansi-alias | |
Enable interprocedural optimization over file boundaries | -fwhole-program -combine | -ipo (included -fast) | -Mipa=fast or -Mipa=fast,inline |
Notes
- using -ansi-alias often helps the auto-vectorize code with the intel compiler, especially when OpenMP is enabled
Input Language
Description | GCC Option | ICC Option | PGCC Option | CrayCC Option |
---|---|---|---|---|
compile C code as C99 | -std=c99 or -std=gnu99 | -std=c99 | -c99 | on by default |
enable OpenMP | -fopenmp | -openmp | -mp | on by default |
enable accelerator directives | not supported | not supported | -ta=host -ta=nvidia or -ta=cc13 | not supported |
Target Architecture
Description | GCC Option | ICC Option | PGCC Option |
---|---|---|---|
select target processor | -march=<processor name> | -x <processor name> | -tp <processor name> |
enable automatic SMP parallelization | not supported | not supported | -Mconcur |
Notes
- the list of recognized processors is different for each compiler, very long and continuously getting longer. See the manuals of the compilers for those lists.
Diagnostic Output
Description | GCC Option | ICC Option | PGCC Option | CrayCC Option |
---|---|---|---|---|
generate auto-vectorization information | -ftree-vectorizer-verbose=<level> | -vec-report=<level> | -Minfo=vec | -h list=m |
Notes
- the PGI compiler has several other -Minfo=<type> switches to produce diagnostic output on inlining, unrolling, IPO, auto-parallelization and accelerator usage
- the Cray compiler has several other -h list=<type> switches to produce diagnostic output on optimization
Assembler Output
Description | GCC Option | ICC Option | PGCC Option | CrayCC Option |
---|---|---|---|---|
generate assembler output instead of object file | -S | -S | -S | -S |
generate assembler output with source code included in asm file | -g -Wa,-a,-ad | -S -fsource-asm | -S -Manno | -h list=d
(not asm but intermediate representation) |