Introduction
The following options are helpful to improve performance and debug performance problems when compiling code on x86 processors using GCC, the Intel or PGI C or Fortran compilers.
Optimization Switches
Description
|
GCC Option
|
ICC Option
|
PGCC Option
|
CRAYCC Option
|
Enable aggressive optimization features in general
|
-O3
|
-O3
|
-O3
|
-O3
|
Enable relaxed floating pointing builtin functions
|
-ffast-math
|
-fp-model fast
|
-Mfprelaxed
|
|
Round demormalized FP values to zero
|
included in -ffast-math
|
-ftz
|
-Mdaz
|
|
Assume associativity of floating point operations
|
-fno-signed-zeros -fno-trapping-math -fassociative-math
|
|
–Mvect=assoc
|
|
Assume C ansi aliasing rules are not violated
|
-fstrict-aliasing (included in -O2 and -O3)
|
-ansi-alias
|
|
|
Enable optimization over file boundaries
|
-fwhole-program -combine
|
|
|
|
Enable optimization over file boundaries at link time
|
-flto
|
-ipo (included in -fast)
|
-Mipa=fast or -Mipa=fast,inline
|
-hipa5 -hwp -hpl=<tempdir>
|
Notes
- using -ansi-alias often helps the auto-vectorize code with the intel compiler, especially when OpenMP is enabled
Input Language
Description
|
GCC Option
|
ICC Option
|
PGCC Option
|
CrayCC Option
|
compile C code as C99
|
-std=c99 or -std=gnu99
|
-std=c99
|
-c99
|
on by default
|
enable OpenMP
|
-fopenmp
|
-openmp
|
-mp
|
on by default
|
enable accelerator directives
|
not supported
|
not supported
|
-ta=host -ta=nvidia or -ta=cc13
|
not supported
|
Target Architecture
Description
|
GCC Option
|
ICC Option
|
PGCC Option
|
select target processor
|
-march=<processor name>
|
-x <processor name>
|
-tp <processor name>
|
enable automatic SMP parallelization
|
not supported
|
not supported
|
-Mconcur
|
Notes
- the list of recognized processors is different for each compiler, very long and continuously getting longer. See the manuals of the compilers for those lists.
Diagnostic Output
Description
|
GCC Option
|
ICC Option
|
PGCC Option
|
CrayCC Option
|
generate auto-vectorization information
|
-ftree-vectorizer-verbose=<level>
|
-vec-report=<level>
|
-Minfo=vec
|
-h list=m
|
Notes
- the PGI compiler has several other -Minfo=<type> switches to produce diagnostic output on inlining, unrolling, IPO, auto-parallelization and accelerator usage
- the Cray compiler has several other -h list=<type> switches to produce diagnostic output on optimization
Assembler Output
Description
|
GCC Option
|
ICC Option
|
PGCC Option
|
CrayCC Option
|
generate assembler output instead of object file
|
-S
|
-S
|
-S
|
-S
|
generate assembler output with source code included in asm file
|
-g -Wa,-a,-ad
|
-S -fsource-asm
|
-S -Manno
|
-h list=d
(not asm but intermediate representation)
|
Notes
- Assembler code of object files or executables can be shown with:
- objdump -d <objfile>
- Source-annotated assembler code of object files or executables can be shown with:
- objdump -dS <objfile>