- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Debugging On XC40

From HLRS Platforms
Revision as of 11:17, 19 January 2016 by Hpcmscho (talk | contribs) (Created page with "In this article some tools are briefly listed to debug and monitor you application. == ATP == In case of a segmentation fault, CRAYs Abnormal Termination Processing (CRAY_...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

In this article some tools are briefly listed to debug and monitor you application.

ATP

In case of a segmentation fault, CRAYs Abnormal Termination Processing ( ATP) can print an application stack trace at the moment of the error of this process, with a minimal amount of effort. Only an environmental variable has to be set:

export ATP_ENABLED=1

More information can be found using

 man intro_atp 

STAT

In case of a hanging application, CRAYs Stack Trace Analysis Tool ( STAT) can help identifying dead locks by presenting merged Stack Trace of all processes. The tool can simply attached to the running application by:

module load stat
STATGUI <JOBID> 

Additional information can be found using:

 module load stat; man intro_stat 

Allinea DDT

For more complex issues or in case of wrong results a debugger can be utilized to monitor the applications behavior. Allinea DDT is a powerful parallel debugger. DDT is described in detail in the User Guide and command line options using “ddt --help”. Allinea DDT has a user-friendly graphical user interface, which also has the capability to start the batch job or connect to a running one. Nevertheless, in this article we concentrate on offline debugging, thus we get rid of the requirements of a interactive session.

Start DDT in offline mode

DDT offline mode is started and controlled using the following command line options in the batch script, for example on three fully populated nodes:

#PBS -l nodes=3:ppn=24
... 
cd $PBS_O_WORKDIR # change in the directory where the job is started from
module load ddt   # make DDT available

ddt --offline report.txt aprun -n 72 -N24 a.out param1  # alternatively the DDT output file can be selected as html version, e.g. report.html.

Sessions

Previously saved sessions can be loaded and performed e.g. on a larger scale. Thus debugging steps can be recorded on a small scale job and then transferred to on a large scale job using the option

 ddt […] --session=myfile.session 

Some options

Breakpoints can be defined using:

 ddt [...] --break-at="main.c:22 if rank==0"

Further conditions can be additionally set, e.g. only triggering when variable rank equals 0.

Tracepoints can be set in a similar manner:

 ddt [...] --trace-at=main.c:22,var1,var2 

Here additionally variables var1 and var2 are logged.

Memory debugging can be enabled, which will activate memory leak reports, by:

 ddt [...] --mem-debug=(fast|balanced|thorough) 
 ddt [...] --check-bounds=(after|before) 


More documentation can be found in the User Guide or using:

ddt --help