- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
Useful Compiler Options On x86: Difference between revisions
From HLRS Platforms
Jump to navigationJump to search
m (added notes section after assembler output) |
m (fixed formatting on assembler notes) |
||
Line 150: | Line 150: | ||
|} | |} | ||
==== Notes ==== | ==== Notes ==== | ||
Assembler code of object files or executables can be shown with: | *Assembler code of object files or executables can be shown with: | ||
::<tt>objdump -d <objfile></tt> | ::<tt>objdump -d <objfile></tt> | ||
Source-annotated assembler code of object files or executables can be shown with: | *Source-annotated assembler code of object files or executables can be shown with: | ||
::<tt>objdump -dS <objfile></tt> | ::<tt>objdump -dS <objfile></tt> |
Latest revision as of 10:44, 5 August 2011
Introduction
The following options are helpful to improve performance and debug performance problems when compiling code on x86 processors using GCC, the Intel or PGI C or Fortran compilers.
Optimization Switches
Description | GCC Option | ICC Option | PGCC Option | CRAYCC Option |
---|---|---|---|---|
Enable aggressive optimization features in general | -O3 | -O3 | -O3 | -O3 |
Enable relaxed floating pointing builtin functions | -ffast-math | -fp-model fast | -Mfprelaxed | |
Round demormalized FP values to zero | included in -ffast-math | -ftz | -Mdaz | |
Assume associativity of floating point operations | -fno-signed-zeros -fno-trapping-math -fassociative-math | –Mvect=assoc | ||
Assume C ansi aliasing rules are not violated | -fstrict-aliasing (included in -O2 and -O3) | -ansi-alias | ||
Enable optimization over file boundaries | -fwhole-program -combine | |||
Enable optimization over file boundaries at link time | -flto | -ipo (included in -fast) | -Mipa=fast or -Mipa=fast,inline | -hipa5 -hwp -hpl=<tempdir> |
Notes
- using -ansi-alias often helps the auto-vectorize code with the intel compiler, especially when OpenMP is enabled
Input Language
Description | GCC Option | ICC Option | PGCC Option | CrayCC Option |
---|---|---|---|---|
compile C code as C99 | -std=c99 or -std=gnu99 | -std=c99 | -c99 | on by default |
enable OpenMP | -fopenmp | -openmp | -mp | on by default |
enable accelerator directives | not supported | not supported | -ta=host -ta=nvidia or -ta=cc13 | not supported |
Target Architecture
Description | GCC Option | ICC Option | PGCC Option |
---|---|---|---|
select target processor | -march=<processor name> | -x <processor name> | -tp <processor name> |
enable automatic SMP parallelization | not supported | not supported | -Mconcur |
Notes
- the list of recognized processors is different for each compiler, very long and continuously getting longer. See the manuals of the compilers for those lists.
Diagnostic Output
Description | GCC Option | ICC Option | PGCC Option | CrayCC Option |
---|---|---|---|---|
generate auto-vectorization information | -ftree-vectorizer-verbose=<level> | -vec-report=<level> | -Minfo=vec | -h list=m |
Notes
- the PGI compiler has several other -Minfo=<type> switches to produce diagnostic output on inlining, unrolling, IPO, auto-parallelization and accelerator usage
- the Cray compiler has several other -h list=<type> switches to produce diagnostic output on optimization
Assembler Output
Description | GCC Option | ICC Option | PGCC Option | CrayCC Option |
---|---|---|---|---|
generate assembler output instead of object file | -S | -S | -S | -S |
generate assembler output with source code included in asm file | -g -Wa,-a,-ad | -S -fsource-asm | -S -Manno | -h list=d
(not asm but intermediate representation) |
Notes
- Assembler code of object files or executables can be shown with:
- objdump -d <objfile>
- Source-annotated assembler code of object files or executables can be shown with:
- objdump -dS <objfile>