- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

NEC Aurora quickstart

From HLRS Platforms
Jump to navigationJump to search

Compilers

mpincc, mpinc++, mpinfort, ncc, nc++ and nfort are the available compilers from the NEC SDK. Compilers are running on the host, only generated executables run on the VE cards.

Compilers follow switches and behaviour of GNU compiler where possible.

A short overview of the most common used compiler options:

  • -O0,-O1,-O2,-O3,-O4: optimization level, -O4 is aggressive and might change results, -O3 includes loop level transformations, inling works for -O2 and higher, but has to be enabled
  • -g: enables debug symbols
  • -p: enable profiling, for processing with ngprof
  • -report-all: creates a .L file with formatted source listing with loop markers.
  • -fdiag-vector=1-3: give detailed messages about vectorization
  • -finline: inline C++ inline functions
  • -finline-functions: inline functions (C/C++/Fortran)
  • -fdiag-inline=1-2: messages about inlining
  • -fopenmp: enable OPENMP
  • -ftrace: enable performance analysis instrumentation, for processing with ftrace
  • -proginf: print short performance summary after program execution
  • -traceback: enable traceback in case of application crash


Compiler directives

To support compiler in optimization, compiler can be given hints.

For Fortran they start with

!$NEC ivdep

For C/C++

#pragma _NEC ivdep

Trouble shooting

When the application is crashing, try

export VE_TRACEBACK=ALL

to get a traceback.

Addresses printed can be resolved into source locations using

/opt/nec/ve/bin/naddr2line

in case the application was compiler with -g2 for debug symbols.

The debugger

/opt/nec/ve/bin/gdb

can be used to attach to running processes, to start processes, or to analyse core dumps.

Execution

Most simple method to run a serial or multithreaded programm on one card is

./a.out

More complex execution with more control of process placement is possible with

/opt/nec/ve/bin/ve_exec ./a.out

the options -N and -c allow to select card (0-7) and core (0-7).

Execution of MPI programs is achieved with

mpirun -vennp 1 -nve 8 ./a.out

to get 1 process on 8 VE cards in local VH.

to do the same on all VH nodes in the batch job, use

mpirun $(vempihelper -vennp 1 -nve 8) ./a.out

Performance analysis

compile with -proginf and get free summary performance information at end of programm execution with

export NMPI_PROGINF=YES|ALL|DETAIL|ALL_DETAIL

for MPI programs or

export VE_PROGINF=YES|DETAIL

for multithreaded programs.

For a breakdown per subroutine, compile and link with -ftrace. Execution will create one or more ftrace.out files, which can be viewed with

ftrace -f filename

or with the GUI tool

/opt/nec/ve/ftraceviewer/ftraceviewer

which is handy for MPI applications.