- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Useful Compiler Options On x86

From HLRS Platforms
Revision as of 10:43, 5 August 2011 by Hpcmango (talk | contribs) (added notes section after assembler output)
Jump to navigationJump to search

Introduction

The following options are helpful to improve performance and debug performance problems when compiling code on x86 processors using GCC, the Intel or PGI C or Fortran compilers.

Optimization Switches

Description GCC Option ICC Option PGCC Option CRAYCC Option
Enable aggressive optimization features in general -O3 -O3 -O3 -O3
Enable relaxed floating pointing builtin functions -ffast-math -fp-model fast -Mfprelaxed
Round demormalized FP values to zero included in -ffast-math -ftz -Mdaz
Assume associativity of floating point operations -fno-signed-zeros -fno-trapping-math -fassociative-math –Mvect=assoc
Assume C ansi aliasing rules are not violated -fstrict-aliasing (included in -O2 and -O3) -ansi-alias
Enable optimization over file boundaries -fwhole-program -combine
Enable optimization over file boundaries at link time -flto -ipo (included in -fast) -Mipa=fast or -Mipa=fast,inline -hipa5 -hwp -hpl=<tempdir>

Notes

  • using -ansi-alias often helps the auto-vectorize code with the intel compiler, especially when OpenMP is enabled

Input Language

Description GCC Option ICC Option PGCC Option CrayCC Option
compile C code as C99 -std=c99 or -std=gnu99 -std=c99 -c99 on by default
enable OpenMP -fopenmp -openmp -mp on by default
enable accelerator directives not supported not supported -ta=host -ta=nvidia or -ta=cc13 not supported

Target Architecture

Description GCC Option ICC Option PGCC Option
select target processor -march=<processor name> -x <processor name> -tp <processor name>
enable automatic SMP parallelization not supported not supported -Mconcur

Notes

  • the list of recognized processors is different for each compiler, very long and continuously getting longer. See the manuals of the compilers for those lists.

Diagnostic Output

Description GCC Option ICC Option PGCC Option CrayCC Option
generate auto-vectorization information -ftree-vectorizer-verbose=<level> -vec-report=<level> -Minfo=vec -h list=m

Notes

  • the PGI compiler has several other -Minfo=<type> switches to produce diagnostic output on inlining, unrolling, IPO, auto-parallelization and accelerator usage
  • the Cray compiler has several other -h list=<type> switches to produce diagnostic output on optimization

Assembler Output

Description GCC Option ICC Option PGCC Option CrayCC Option
generate assembler output instead of object file -S -S -S -S
generate assembler output with source code included in asm file -g -Wa,-a,-ad -S -fsource-asm -S -Manno -h list=d

(not asm but intermediate representation)

Notes

Assembler code of object files or executables can be shown with:

objdump -d <objfile>

Source-annotated assembler code of object files or executables can be shown with:

objdump -dS <objfile>