- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

CRAY XE6 Cray Tools: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
No edit summary
No edit summary
Line 1: Line 1:


== Cray provided tools ==
== Cray provided tools ==
Cray does provide several official tools. Below is a list of some of the tools, you can get more information about them in the online manual ('''man atp''' for example).
Jump to [[CRAY_XE6_Cray_Tools#ATP : Abnormal Termination Processing|ATP]], STAT
At HLRS


=== ATP : Abnormal Termination Processing ===
=== ATP : Abnormal Termination Processing ===

Revision as of 11:13, 4 April 2013

Cray provided tools

Cray does provide several official tools. Below is a list of some of the tools, you can get more information about them in the online manual (man atp for example). Jump to ATP, STAT

At HLRS

ATP : Abnormal Termination Processing

Abnormal Termination Processing (ATP) is a system that monitors Cray system user applications. If an application takes a system trap, ATP performs analysis on the dying application. All stack backtraces of the application processes are gathered into a merged stack backtrace tree and written to disk as the file, atpMergedBT.dot. The stack backtrace tree for the first process to die is sent to stderr as is the number of the signal that caused the application to fail. If Linux core dumping is enabled (see ulimit or limit in your shell documentation), a heuristically selected set of processes also dump their cores.

The atpMergedBT.dot file can be viewed with statview, (the Stack Trace Analysis Tool viewer), which is included in the Cray Debugger Support Tools (module load stat), or alternatively with the file viewer dotty, which can be found on most Linux systems. The merged stack backtrace tree provides a concise yet comprehensive view of what the application was doing at the time of its termination.

At HLRS ATP is disabled by default. To use it you have to set ATP_ENABLED=1 in your batch script.

STAT : Stack Trace Analysis Tool.

STAT is a toll for collecting tracebacks of a running program. You use statview to view the output of STAT, both tools are part of the module stat.

STAT needs the process id of the aprun (apid) command which runs your program. As this apid is not available on the login nodes, we have written a wrapper called STAT_hermit.

Instead of the apid, the wrapper uses id of your batch job (use qstat to get it) and tries to find the corresponding aprun command. If there are several possibilities, it will show you a list of possibities and ask you to select the one you want to trace.