- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -

NEC Aurora quickstart

From HLRS Platforms


mpincc, mpinc++, mpinfort, ncc, nc++ and nfort are the available compilers from the NEC SDK. Compilers are running on the host, only generated executables run on the VE cards.

Compilers follow switches and behaviour of GNU compiler where possible.

A short overview of the most common used compiler options:

  • -O0,-O1,-O2,-O3,-O4: optimization level, -O4 is aggressive and might change results, -O3 includes loop level transformations, inling works for -O2 and higher, but has to be enabled
  • -g: enables debug symbols
  • -p: enable profiling, for processing with ngprof
  • -report-all: creates a .L file with formatted source listing with loop markers.
  • -fdiag-vector=1-3: give detailed messages about vectorization
  • -finline: inline C++ inline functions
  • -finline-functions: inline functions (C/C++/Fortran)
  • -fdiag-inline=1-2: messages about inlining
  • -fopenmp: enable OPENMP
  • -ftrace: enable performance analysis instrumentation, for processing with ftrace
  • -proginf: print short performance summary after program execution
  • -traceback: enable traceback in case of application crash

Compiler directives

To support compiler in optimization, compiler can be given hints.

For Fortran they start with

!$NEC ivdep

For C/C++

#pragma _NEC ivdep

Trouble shooting

When the application is crashing, try


to get a traceback.

Addresses printed can be resolved into source locations using


in case the application was compiler with -g2 for debug symbols.

The debugger


can be used to attach to running processes, to start processes, or to analyse core dumps.


Most simple method to run a serial or multithreaded programm on one card is


More complex execution with more control of process placement is possible with

/opt/nec/ve/bin/ve_exec ./a.out

the options -N and -c allow to select card (0-7) and core (0-7).

Execution of MPI programs is achieved with

mpirun -vennp 1 -nve 8 ./a.out

to get 1 process on 8 VE cards in local VH.

to do the same on all VH nodes in the batch job, use

mpirun $(vempihelper -vennp 1 -nve 8) ./a.out

Performance analysis

compile with -proginf and get free summary performance information at end of programm execution with


for MPI programs or


for multithreaded programs.

For a breakdown per subroutine, compile and link with -ftrace. Execution will create one or more ftrace.out files, which can be viewed with

ftrace -f filename

or with the GUI tool


which is handy for MPI applications.