- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
NEC Aurora quickstart
Compilers
mpincc, mpinc++, mpinfort, ncc, nc++ and nfort are the available compilers from the NEC SDK. Compilers are running on the host, only generated executables run on the VE cards.
Compilers follow switches and behaviour of GNU compiler where possible.
A short overview of the most common used compiler options:
- -O0,-O1,-O2,-O3,-O4: optimization level, -O4 is aggressive and might change results, -O3 includes loop level transformations, inling works for -O2 and higher, but has to be enabled
- -g: enables debug symbols
- -p: enable profiling, for processing with ngprof
- -report-all: creates a .L file with formatted source listing with loop markers.
- -fdiag-vector=1-3: give detailed messages about vectorization
- -finline: inline C++ inline functions
- -finline-functions: inline functions (C/C++/Fortran)
- -fdiag-inline=1-2: messages about inlining
- -fopenmp: enable OPENMP
- -ftrace: enable performance analysis instrumentation, for processing with ftrace
- -proginf: print short performance summary after program execution
- -traceback: enable traceback in case of application crash
Compiler directives
To support compiler in optimization, compiler can be given hints.
For Fortran they start with
!$NEC ivdep
For C/C++
#pragma _NEC ivdep
Trouble shooting
When the application is crashing, try
export VE_TRACEBACK=ALL
to get a traceback.
Addresses printed can be resolved into source locations using
/opt/nec/ve/bin/naddr2line
in case the application was compiler with -g2 for debug symbols.
The debugger
/opt/nec/ve/bin/gdb
can be used to attach to running processes, to start processes, or to analyse core dumps.
Execution
Most simple method to run a serial or multithreaded programm on one card is
./a.out
More complex execution with more control of process placement is possible with
/opt/nec/ve/bin/ve_exec ./a.out
the options -N and -c allow to select card (0-7) and core (0-7).
Execution of MPI programs is achieved with
mpirun -vennp 1 -nve 8 ./a.out
to get 1 process on 8 VE cards in local VH.
to do the same on all VH nodes in the batch job, use
mpirun $(vempihelper -vennp 1 -nve 8) ./a.out
Performance analysis
compile with -proginf and get free summary performance information at end of programm execution with
export NMPI_PROGINF=YES|ALL|DETAIL|ALL_DETAIL
for MPI programs or
export VE_PROGINF=YES|DETAIL
for multithreaded programs.
For a breakdown per subroutine, compile and link with -ftrace. Execution will create one or more ftrace.out files, which can be viewed with
ftrace -f filename
or with the GUI tool
/opt/nec/ve/ftraceviewer/ftraceviewer
which is handy for MPI applications.