- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Useful Compiler Options On x86: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
(added cray vecreport option)
m (fixed formatting on assembler notes)
 
(5 intermediate revisions by the same user not shown)
Line 11: Line 11:
!  ICC Option
!  ICC Option
!  PGCC Option
!  PGCC Option
!  CRAYCC Option
|-
|-
| Enable aggressive optimization features in general
| Enable aggressive optimization features in general
|  -O3
|  -O3
|  -fast
|  -O3
|  -fast
|  -O3
|  -O3
|-
|-
| Enable relaxed floating pointing builtin functions
| Enable relaxed floating pointing builtin functions
Line 21: Line 23:
|  -fp-model fast
|  -fp-model fast
|  -Mfprelaxed
|  -Mfprelaxed
|
|-
|-
| Round demormalized FP values to zero
| Round demormalized FP values to zero
Line 26: Line 29:
| -ftz
| -ftz
| -Mdaz
| -Mdaz
|
|-
|-
| Assume associativity of floating point operations
| Assume associativity of floating point operations
Line 31: Line 35:
|
|
|  –Mvect=assoc
|  –Mvect=assoc
|
|-
|-
| Assume C ansi aliasing rules are not violated
| Assume C ansi aliasing rules are not violated
| -fstrict-aliasing (included in -O2 and -O3)
| -fstrict-aliasing (included in -O2 and -O3)
| -ansi-alias
| -ansi-alias
|
|
|-
| Enable optimization over file boundaries
| -fwhole-program -combine
|
|
|
|
|-
|-
| Enable interprocedural optimization over file boundaries
| Enable optimization over file boundaries at link time
| -fwhole-program
| -flto
| -ipo (included -fast)
| -ipo (included in -fast)
| -Mipa=fast or -Mipa=fast,inline
| -Mipa=fast or -Mipa=fast,inline
| -hipa5 -hwp -hpl=<tempdir>
|}
|}
==== Notes ====
==== Notes ====
Line 112: Line 125:
==== Notes ====
==== Notes ====
* the PGI compiler has several other -Minfo=<type> switches to produce diagnostic output on inlining, unrolling, IPO, auto-parallelization and accelerator usage
* the PGI compiler has several other -Minfo=<type> switches to produce diagnostic output on inlining, unrolling, IPO, auto-parallelization and accelerator usage
* the Cray compiler has several other -h list=<type> switches to produce diagnostic output on optimization


=== Assembler Output ===
=== Assembler Output ===
Line 135: Line 149:
(not asm but intermediate representation)
(not asm but intermediate representation)
|}
|}
==== Notes ====
*Assembler code of object files or executables can be shown with:
::<tt>objdump -d <objfile></tt>
*Source-annotated assembler code of object files or executables can be shown with:
::<tt>objdump -dS <objfile></tt>

Latest revision as of 10:44, 5 August 2011

Introduction

The following options are helpful to improve performance and debug performance problems when compiling code on x86 processors using GCC, the Intel or PGI C or Fortran compilers.

Optimization Switches

Description GCC Option ICC Option PGCC Option CRAYCC Option
Enable aggressive optimization features in general -O3 -O3 -O3 -O3
Enable relaxed floating pointing builtin functions -ffast-math -fp-model fast -Mfprelaxed
Round demormalized FP values to zero included in -ffast-math -ftz -Mdaz
Assume associativity of floating point operations -fno-signed-zeros -fno-trapping-math -fassociative-math –Mvect=assoc
Assume C ansi aliasing rules are not violated -fstrict-aliasing (included in -O2 and -O3) -ansi-alias
Enable optimization over file boundaries -fwhole-program -combine
Enable optimization over file boundaries at link time -flto -ipo (included in -fast) -Mipa=fast or -Mipa=fast,inline -hipa5 -hwp -hpl=<tempdir>

Notes

  • using -ansi-alias often helps the auto-vectorize code with the intel compiler, especially when OpenMP is enabled

Input Language

Description GCC Option ICC Option PGCC Option CrayCC Option
compile C code as C99 -std=c99 or -std=gnu99 -std=c99 -c99 on by default
enable OpenMP -fopenmp -openmp -mp on by default
enable accelerator directives not supported not supported -ta=host -ta=nvidia or -ta=cc13 not supported

Target Architecture

Description GCC Option ICC Option PGCC Option
select target processor -march=<processor name> -x <processor name> -tp <processor name>
enable automatic SMP parallelization not supported not supported -Mconcur

Notes

  • the list of recognized processors is different for each compiler, very long and continuously getting longer. See the manuals of the compilers for those lists.

Diagnostic Output

Description GCC Option ICC Option PGCC Option CrayCC Option
generate auto-vectorization information -ftree-vectorizer-verbose=<level> -vec-report=<level> -Minfo=vec -h list=m

Notes

  • the PGI compiler has several other -Minfo=<type> switches to produce diagnostic output on inlining, unrolling, IPO, auto-parallelization and accelerator usage
  • the Cray compiler has several other -h list=<type> switches to produce diagnostic output on optimization

Assembler Output

Description GCC Option ICC Option PGCC Option CrayCC Option
generate assembler output instead of object file -S -S -S -S
generate assembler output with source code included in asm file -g -Wa,-a,-ad -S -fsource-asm -S -Manno -h list=d

(not asm but intermediate representation)

Notes

  • Assembler code of object files or executables can be shown with:
objdump -d <objfile>
  • Source-annotated assembler code of object files or executables can be shown with:
objdump -dS <objfile>