- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
Debugging On XC40
In this article some tools are briefly listed to debug and monitor you application.
ATP
In case of a segmentation fault, CRAYs Abnormal Termination Processing ( ATP) can print an application stack trace at the moment of the error of this process, with a minimal amount of effort. Only an environmental variable has to be set:
export ATP_ENABLED=1
More information can be found using
man intro_atp
STAT
In case of a hanging application, CRAYs Stack Trace Analysis Tool ( STAT) can help identifying dead locks by presenting merged Stack Trace of all processes. The tool can simply attached to the running application by:
module load stat STATGUI <JOBID>
Additional information can be found using:
module load stat; man intro_stat
Allinea DDT
For more complex issues or in case of wrong results a debugger can be utilized to monitor the applications behavior. Allinea DDT is a powerful parallel debugger. DDT is described in detail in the User Guide and command line options using “ddt --help”. Allinea DDT has a user-friendly graphical user interface, which also has the capability to start the batch job or connect to a running one. Nevertheless, in this article we concentrate on offline debugging, thus we get rid of the requirements of a interactive session.
Start DDT in offline mode
DDT offline mode is started and controlled using the following command line options in the batch script, for example on three fully populated nodes:
#PBS -l nodes=3:ppn=24 ... cd $PBS_O_WORKDIR # change in the directory where the job is started from module load ddt # make DDT available ddt --offline report.txt aprun -n 72 -N24 a.out param1 # alternatively the DDT output file can be selected as html version, e.g. report.html.
Sessions
Previously saved sessions can be loaded and performed e.g. on a larger scale. Thus debugging steps can be recorded on a small scale job and then transferred to on a large scale job using the option
ddt […] --session=myfile.session
Some options
Breakpoints can be defined using:
ddt [...] --break-at="main.c:22 if rank==0"
Further conditions can be additionally set, e.g. only triggering when variable rank equals 0.
Tracepoints can be set in a similar manner:
ddt [...] --trace-at=main.c:22,var1,var2
Here additionally variables var1 and var2 are logged.
Memory debugging can be enabled, which will activate memory leak reports, by:
ddt [...] --mem-debug=(fast|balanced|thorough)
ddt [...] --check-bounds=(after|before)
More documentation can be found in the User Guide or using:
ddt --help