- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
CRAY XC40 Tools: Difference between revisions
Line 73: | Line 73: | ||
A detailed documantation about CrayPAT can be found in document [http://docs.cray.com/books/S-2376-622/S-2376-622.pdf S-2376-622]. | A detailed documantation about CrayPAT can be found in document [http://docs.cray.com/books/S-2376-622/S-2376-622.pdf S-2376-622]. | ||
Here a short summary is presented, concentrating on the usage. | Here a short summary is presented, concentrating on the usage. | ||
Profiling is mainly distinguished between two main run cases, sampling and tracing: | |||
{|border="1" cellpadding="2" | |||
!width="250"|Sampling | |||
!width="250"|Tracing | |||
|- | |||
|Advantages | |||
*Only need to instrument main routine | |||
*Low Overhead – depends only on sampling frequency | |||
*Smaller volumes of data produced | |||
|Advantages | |||
*More accurate and more detailed information | |||
*Data collected from every traced function call not statistical averages | |||
|- | |||
|Disadvantages | |||
*Only statistical averages available | |||
*Limited information from performance counters | |||
|Disadvantages | |||
*Increased overheads as number of function calls increases | |||
*Huge volumes of data generated | |||
|} | |||
Using the fully adjustable CrayPAT, Automatic Profiling Analysis (APA) is a guided tracing combining the advantages of Sampling and tracing. | |||
Furthermore, the event tracing can be enhanced by using loop profiling. | |||
==== Usage ==== | ==== Usage ==== | ||
Line 78: | Line 102: | ||
As instrumentation modules following is available: | As instrumentation modules following is available: | ||
* '''[[CRAY_XC40_Tools#perftools-lite|perftools-lite]]''' (sampling experiments) | * '''[[CRAY_XC40_Tools#perftools-lite|perftools-lite]]''' (sampling experiments) | ||
* '''perftools-lite-events''' (tracing experimants) | * '''[[CRAY_XC40_Tools#perftools-lite-events|perftools-lite-events]]''' (tracing experimants) | ||
* '''perftools-lite-loops''' (collect data for auto-parallelization / loop estimates in Reveal) | * '''[[CRAY_XC40_Tools#perftools-lite-loops|perftools-lite-loops]]''' (collect data for auto-parallelization / loop estimates in Reveal) | ||
* '''perftools-lite-gpu''' (gpu kernel and data movemnets) | * '''perftools-lite-gpu''' (gpu kernel and data movemnets) | ||
* '''perftools''' (fully adjustable CrayPAT, using pat_build and pat_report) | * '''[[CRAY_XC40_Tools#perftools|perftools]]''' (fully adjustable CrayPAT, using pat_build and pat_report) | ||
===== CrayPAT-lite ===== | |||
The CrayPAT-lite modules provide a user-friendly way to auto-instrument your application for various profiling cases. | |||
In the following examples a simple batch job script is used: | |||
<pre>$> cat job.pbs | |||
#!/bin/bash | |||
#PBS –l nodes=1:ppn=24 | |||
#PBS –l walltime=00:10:00 | |||
#PBS –j oe | |||
#PBS -o job.out | |||
cd $PBS_O_WORKDIR | |||
aprun –n 384 –N 24 <exe> | |||
</pre> | |||
====== perftools-lite ====== | ====== perftools-lite ====== | ||
This module is default CrayPat-lite profiling. It enables sampling of the application. | This module is default CrayPat-lite profiling. It enables sampling of the application. | ||
Line 93: | Line 126: | ||
$> module load perftools-lite | $> module load perftools-lite | ||
$> make clean; make | $> make clean; make | ||
$> aprun –n 24 app.exe >& job.out | $> qsub job.pbs <span style="color:#808080"># no changes needed: aprun –n 24 app.exe >& job.out </span> | ||
$> less job.out | $> less job.out | ||
</pre> | </pre> | ||
As a result a *.rpt and a *.ap2 file are created and the report is additionally printed to stdout. | |||
Revision as of 10:57, 30 October 2015
Cray provided tools
Cray does provide several official tools. Below is a list of some of the tools, you can get more information about them in the online manual (man atp for example).
At HLRS Cray also supports some tools with limited or no support. Currently available is the Cray Profiler
ATP : Abnormal Termination Processing
Abnormal Termination Processing (ATP) is a system that monitors Cray system user applications. If an application takes a system trap, ATP performs analysis on the dying application. All stack backtraces of the application processes are gathered into a merged stack backtrace tree and written to disk as the file, atpMergedBT.dot. The stack backtrace tree for the first process to die is sent to stderr as is the number of the signal that caused the application to fail. If Linux core dumping is enabled (see ulimit or limit in your shell documentation), a heuristically selected set of processes also dump their cores.
The atpMergedBT.dot file can be viewed with statview, (the Stack Trace Analysis Tool viewer), which is included in the Cray Debugger Support Tools (module load stat), or alternatively with the file viewer dotty, which can be found on most Linux systems. The merged stack backtrace tree provides a concise yet comprehensive view of what the application was doing at the time of its termination.
At HLRS ATP module is loaded by default. To use it you have to set ATP_ENABLED=1 in your batch script.
STAT : Stack Trace Analysis Tool
Stack Trace Analysis Tool (STAT) is a cross-platform tool from the University of Wisconsin-Madison. It gathers and merges stack traces from a running application’s parallel processes. It creates call graph prefix tree, which are a compressed representation, with scalable visualization and scalable analysis It is very useful when application seems to be stuck/hung. Full information including use cases is available at {http://www.paradyn.org/STAT/STAT.html paradyn}. STAT scales to many thousands of concurrent process.
To use it, you simply load the module and attach it to your running/hanging application.
$> module load stat $> qsub job.pbs #start the application e.g. using a batch script #Wait until application reaches the suspicious state $> STATGUI <JOBID> #Launches the graphical interface #Attach to the job #Shows the calltree $> qdel <JOBID> #Terminate the running application
IOBUF - I/O buffering library
IOBUF is an I/O buffering library that can reduce the I/O wait time for programs that read or write large files sequentially. IOBUF intercepts I/O system calls such as read and open and adds a layer of buffering, thus improving program performance by enabling asynchronous prefetching and caching of file data.
IOBUF can also gather runtime statistics and print a summary report of I/O activity for each file.
In general, no program source changes are needed in order to take advantage of IOBUF. Instead, IOBUF is implemented by following these steps:
Load the IOBUF module:
% module load iobuf
Relink the program. Set the IOBUF_PARAMS environment variable as needed.
% export IOBUF_PARAMS='*:verbose'
Execute the program.
If a memory allocation error occurs, buffering is reduced or disabled for that file and a diagnostic is printed to stderr. When the file is opened, a single buffer is allocated if buffering is enabled. The allocation of additional buffers is done when a buffer is needed. When a file is closed, its buffers are freed (unless asynchronous I/O is pending on the buffer and lazyclose is specified).
Please check the complete manual and all environment variables available by reading the man page (man iobuf, after loading the iobuf module)
IMPORTANT NOTICE : As iobuf is written for serial IO, its behavior is undefined when used for parallel I/O into a single file.
You should never use IOBUF in the case when several parallel processes operates on a single file.
Perftools : Performance Analysis Tool Kit
Description
The Cray Performance Measurement and Analysis Tools (or CrayPat) are a suite of optional utilities that enable you to capture and analyze performance data generated during the execution of your program on a Cray system. The information collected and analysis produced by use of these tools can help you to find answers to two fundamental programming questions: How fast is my program running? and How can I make it run faster? A detailed documantation about CrayPAT can be found in document S-2376-622. Here a short summary is presented, concentrating on the usage.
Profiling is mainly distinguished between two main run cases, sampling and tracing:
Sampling | Tracing |
---|---|
Advantages
|
Advantages
|
Disadvantages
|
Disadvantages
|
Using the fully adjustable CrayPAT, Automatic Profiling Analysis (APA) is a guided tracing combining the advantages of Sampling and tracing. Furthermore, the event tracing can be enhanced by using loop profiling.
Usage
Starting with perftools version 6.3.0 as a basis the module perftools-base should be loaded. This provides access to man pages, Reveal, Cray Apprentice2, and the new instrumentation modules. This module can be kept loaded without impact to applications. As instrumentation modules following is available:
- perftools-lite (sampling experiments)
- perftools-lite-events (tracing experimants)
- perftools-lite-loops (collect data for auto-parallelization / loop estimates in Reveal)
- perftools-lite-gpu (gpu kernel and data movemnets)
- perftools (fully adjustable CrayPAT, using pat_build and pat_report)
CrayPAT-lite
The CrayPAT-lite modules provide a user-friendly way to auto-instrument your application for various profiling cases. In the following examples a simple batch job script is used:
$> cat job.pbs #!/bin/bash #PBS –l nodes=1:ppn=24 #PBS –l walltime=00:10:00 #PBS –j oe #PBS -o job.out cd $PBS_O_WORKDIR aprun –n 384 –N 24 <exe>
perftools-lite
This module is default CrayPat-lite profiling. It enables sampling of the application.
$> module load perftools-base $> module load perftools-lite $> make clean; make $> qsub job.pbs <span style="color:#808080"># no changes needed: aprun –n 24 app.exe >& job.out </span> $> less job.out
As a result a *.rpt and a *.ap2 file are created and the report is additionally printed to stdout.
CrayPAT
Reveal
Apprentice2
Cray Profiler
The Cray profiler library is deprecated, but still available on the system. A description can be found here
Third party tools
Gnu-Tools
The module gnu-tools collects more recent versions of basic functionalities, including the GNU building system (autoconf, automake, libtool, m4), as well as bash, cmake, gperf, git, gwak, swig, and bison. The actual versions can be listed using
% module whatis tools/gnu-tools
To use the actual version of bash with full support of the module environment you can simply call
% bash -l myScript.sh
or define the absolute path in the first line of your script
#!/opt/hlrs/tools/gnu-tools/generic/bin/bash -l
Octave
GNU Octave is a high-level interpreted language, primarily intended for numerical computations. It provides capabilities for the numerical solution of linear and nonlinear problems, and for performing other numerical experiments. It also provides extensive graphics capabilities for data visualization and manipulation. GNU Octave is normally used through its interactive interface (CLI and GUI), but it can also be used to write non-interactive programs. The GNU Octave language is quite similar to Matlab so that most programs are easily portable.
Octave is compiled to run on the compute nodes and can be launched e.g. in an interactive session:
% qsub -I [options] % module load tools/octave % aprun -n 1 -N 1 octave octave.script
PARPACK
With the module hlrs_PARPACK the collections of f77 routines designed to solve large scale eigenvalue problems (ARPACK) and the parallel version (PARPACK) are provided. To link these libraries you only have to load the module
numlib/hlrs_PARPACK
Important Features of ARPACK:
- Reverse Communication Interface.
- Single and Double Precision Real Arithmetic Versions for Symmetric, Non-symmetric,
- Standard or Generalized Problems.
- Single and Double Precision Complex Arithmetic Versions for Standard or Generalized Problems.
- Routines for Banded Matrices - Standard or Generalized Problems.
- Routines for The Singular Value Decomposition.
- Example driver routines that may be used as templates to implement numerous Shift-Invert strategies for all problem types, data types and precision.
Python
Actual versions of Python can be used loading the module tools/python.
SLEPc
The SLEPc (Scalable Library for Eigenvalue Problem Computations) is an extantion of PETSc for solving linear eigenvalue problems in either standard or generalized form. Furthermore, SLEPc can compute partial SVD of a large, sparse, rectangular matrix, and solve nonlinear eigenvalue problems (polynomial or general). Additionally, SLEPc provides solvers for the computation of the action of a matrix function on a vector. SLEPc can be used for real (default) and complex arithmetics, therefore two different modules are provided:
module load numlib/hlrs_SLEPc # deafault version
OR
module load numlib/hlrs_SLEPc/3.5.3-complex
As usual the modules provides all compiler and linker flags, thus ex1.c (containing SLEPc calls) can be simply compiled by
cc ex1.c -o ex1.exe
SVN
Subversion is installed with the following repository access (RA) modules: ra_svn, ra_local, ra_serf. Plaintext and GPG-Agent authentication credential caches are avaiable.
module load tools/svn
Utilities for processing netcdf files
The module tools/netcdf_utils contains the follwing tools:
- nco (see http://nco.sourceforge.net/)
- ncview (see http://meteora.ucsd.edu/~pierce/ncview_home_page.html)
- cdo (see https://code.zmaw.de/projects/cdo)
Third party scientific software
CP2K
CP2K is a freely available (GPL) program to perform atomistic and molecular simulations of solid state, liquid, molecular and biological systems. It provides a general framework for different methods such as e.g. density functional theory (DFT) using a mixed Gaussian and plane waves approach (GPW), and classical pair and many-body potentials. It is very well and consistently written, standards-conforming Fortran 95, parallelized with MPI and in some parts with hybrid OpenMP+MPI as an option.
CP2K provides state-of-the-art methods for efficient and accurate atomistic simulations, sources are freely available and actively improved. It has an active international development team, with the unofficial head quarters in the University of Zürich.
The molecular simulation package is installed, optimized for the present architecture, compiled with gfortran using optimized versions of libxc, libint and libsmm.
module load chem/cp2k
provide four versions of different kind of parallelizations:
cp2k.ssmp - only OpenMP cp2k.popt - only MPI cp2k.psmp - hybrid MPI + OpenMP cp2k.pdbg - only MPI compiled with debug flags
After loading the related module (chem/cp2k), the binary can be directly called in the job submission script, e.g.:
aprun -n 24 -N 24 cp2k.psmp myCp2kInputFile.inp > myOutput.out
Some examples for CP2K input files are provided on the CP2K homepage and there also exist the input reference.
Gromacs
GROMACS (GROningen MAchine for Chemical Simulations) is a molecular dynamics package which can be used by
module load chem/gromacs
LAMMPS
LAMMPS "LAMMPS Molecular Dynamics Simulator" is a molecular dynamics package which can be used by
module load chem/lammps
The executable is named lmp_CrayXC.
NAMD
NAMD (Scalable Molecular Dynamics) is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems, based on Charm++ parallel objects. The package can be loaded using
module load chem/namd
A tutorial can be found here.
OpenFOAM
OpenFOAM (Open Field Operation and Manipulation) is an open source CFD software package. Multiple versions of OpenFOAM are available compiled with gnu and intel. Available versions can be listed using
module avail cae/openfoam
OpenFOAM can be used with PrgEnv-gnu and PrgEnv-intel, e.g.
module swap PrgEnv-cray PrgEnv-gnu module load cae/openfoam
Furthermore, Foam-extend is available but only for PrgEnv-gnu
module swap PrgEnv-cray PrgEnv-gnu module load cae/openfoam/3.0-extend
As a first example a test case of incompressible laminar flow in a cavity using blockMesh and icoFoam is provided, which can be found in the directory
/opt/hlrs/cae/fluid/OPENFOAM/ESM/CRAY-Versionen/hornet-example
To run this example you have to copy the directory and submit the prepareOF and runOF jobs.
It is also possible to use CrayPAT profiling for certain version of OpenFOAM. Therefore, specialized module exist providing relevant versions cae/openfoam/xxx-perftools, where xxx are version numbers. The related binaries still has to be instrumented using
pat_build $FOAM_APPBIN/icoFoam
As a result a binary icoFoam+pat is generated in the current directory. Using these binary in the batch script the profiling will be performed. To analyze the resulting profiling data pat_report and further tools can be used (Cray Performance Tools). If during the execution of your instrumented binary you notice that the MPI is not recognized, i.e. you see replicated output or several *.xf files not collected in a single directory in your workspace, you cat export PAT_BUILD_PROG_MODELS="0x1" in your shell and run the pat_build command again after removing the instrumented binary. Please file a ticket if this did not work for you.