- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
Difference between revisions of "NEC Aurora quickstart"
(Created page with "== Compilers == mpincc, mpinc++, mpinfort, ncc, nc++ and nfort are the available compilers from the NEC SDK. Compilers are running on the host, only generated executables run...")
Latest revision as of 17:54, 20 February 2020
mpincc, mpinc++, mpinfort, ncc, nc++ and nfort are the available compilers from the NEC SDK. Compilers are running on the host, only generated executables run on the VE cards.
Compilers follow switches and behaviour of GNU compiler where possible.
A short overview of the most common used compiler options:
- -O0,-O1,-O2,-O3,-O4: optimization level, -O4 is aggressive and might change results, -O3 includes loop level transformations, inling works for -O2 and higher, but has to be enabled
- -g: enables debug symbols
- -p: enable profiling, for processing with ngprof
- -report-all: creates a .L file with formatted source listing with loop markers.
- -fdiag-vector=1-3: give detailed messages about vectorization
- -finline: inline C++ inline functions
- -finline-functions: inline functions (C/C++/Fortran)
- -fdiag-inline=1-2: messages about inlining
- -fopenmp: enable OPENMP
- -ftrace: enable performance analysis instrumentation, for processing with ftrace
- -proginf: print short performance summary after program execution
- -traceback: enable traceback in case of application crash
To support compiler in optimization, compiler can be given hints.
For Fortran they start with
#pragma _NEC ivdep
When the application is crashing, try
to get a traceback.
Addresses printed can be resolved into source locations using
in case the application was compiler with -g2 for debug symbols.
can be used to attach to running processes, to start processes, or to analyse core dumps.
Most simple method to run a serial or multithreaded programm on one card is
More complex execution with more control of process placement is possible with
the options -N and -c allow to select card (0-7) and core (0-7).
Execution of MPI programs is achieved with
mpirun -vennp 1 -nve 8 ./a.out
to get 1 process on 8 VE cards in local VH.
to do the same on all VH nodes in the batch job, use
mpirun $(vempihelper -vennp 1 -nve 8) ./a.out
compile with -proginf and get free summary performance information at end of programm execution with
for MPI programs or
for multithreaded programs.
For a breakdown per subroutine, compile and link with -ftrace. Execution will create one or more ftrace.out files, which can be viewed with
ftrace -f filename
or with the GUI tool
which is handy for MPI applications.