- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

CRAY XE6 Cray Tools: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
(Created page with " == Cray provided tools == === STAT ===")
 
(Added ATP and STAT(wrapper))
Line 2: Line 2:
== Cray provided tools ==
== Cray provided tools ==


=== STAT ===
=== ATP : Abnormal Termination Processing ===
Abnormal Termination Processing (ATP) is a system that monitors Cray system user applications. If an application takes a system trap, ATP
performs analysis on the dying application. All stack backtraces of the application processes are gathered into a merged stack backtrace
tree and written to disk as the file, atpMergedBT.dot. The stack backtrace tree for the first process to die is sent to stderr as is the
number of the signal that caused the application to fail. If Linux core dumping is enabled (see ulimit or limit in your shell
documentation), a heuristically selected set of processes also dump their cores.
 
The atpMergedBT.dot file can be viewed with statview, (the Stack Trace Analysis Tool viewer), which is included in the Cray Debugger
Support Tools (module load stat), or alternatively with the file viewer dotty, which can be found on most Linux systems. The merged stack
backtrace tree provides a concise yet comprehensive view of what the application was doing at the time of its termination.
 
At HLRS ATP is disabled by default. To use it you have to set ATP_ENABLED=1 in your batch script.
 
=== STAT :  Stack Trace Analysis Tool. ===
STAT is a toll for collecting tracebacks of a running program. You use statview to view the output of STAT.
 
STAT and statview are part of the module stat.
 
STAT needs the process id of the aprun (apid) command which runs your program. As this apid is not available on the login nodes, we have written a wrapper for STAT.
 
Instead of the apid, the wrapper uses id of your batch job (use qstat to get it) and tries to find the corresponding aprun command. If there are several possibilities, it will show you a list of possibities and ask you to select the one you want to trace.

Revision as of 13:17, 31 May 2012

Cray provided tools

ATP : Abnormal Termination Processing

Abnormal Termination Processing (ATP) is a system that monitors Cray system user applications. If an application takes a system trap, ATP

performs analysis on the dying application. All stack backtraces of the application processes are gathered into a merged stack backtrace
tree and written to disk as the file, atpMergedBT.dot. The stack backtrace tree for the first process to die is sent to stderr as is the
number of the signal that caused the application to fail. If Linux core dumping is enabled (see ulimit or limit in your shell
documentation), a heuristically selected set of processes also dump their cores.

The atpMergedBT.dot file can be viewed with statview, (the Stack Trace Analysis Tool viewer), which is included in the Cray Debugger

Support Tools (module load stat), or alternatively with the file viewer dotty, which can be found on most Linux systems. The merged stack
backtrace tree provides a concise yet comprehensive view of what the application was doing at the time of its termination.

At HLRS ATP is disabled by default. To use it you have to set ATP_ENABLED=1 in your batch script.

STAT : Stack Trace Analysis Tool.

STAT is a toll for collecting tracebacks of a running program. You use statview to view the output of STAT.

STAT and statview are part of the module stat.

STAT needs the process id of the aprun (apid) command which runs your program. As this apid is not available on the login nodes, we have written a wrapper for STAT.

Instead of the apid, the wrapper uses id of your batch job (use qstat to get it) and tries to find the corresponding aprun command. If there are several possibilities, it will show you a list of possibities and ask you to select the one you want to trace.