- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -

Difference between revisions of "CRAY XC40 Resource Utilization Reporting"

From HLRS Platforms
(Created page with "Resource Utilization Reporting (RUR) is a tool for gathering statistics on how system resources are being used by applications. When RUR is enabled on a Cray system running...")
 
m (added units)
 
Line 6: Line 6:
  
 
The "taskstats" prints the following information:
 
The "taskstats" prints the following information:
* utime: User time
+
* utime: User time [μsec] (accumulated over all processes)
* stime: System time
+
* stime: System time [μsec]
* max_rss: Maximum memory used.
+
* max_rss: Maximum memory used [KiB] (excluding hugepages)
* rchar: Characters read by process.
+
* rchar: Characters read by process
 
* wchar: Characters written by process
 
* wchar: Characters written by process
 
* exitcode: Lists all unique exit codes
 
* exitcode: Lists all unique exit codes
* core: Set to '1' if core dump occurred.
+
* core: Set to '1' if core dump occurred
  
The "energy" plugin prints the energy used by the job in Joules. And the "timestamp" plugin prints the moment job started and moment when it finished.
+
The "energy" plugin prints the energy used by the job in Joules.
 +
And the "timestamp" plugin prints the moment job started and moment when it finished.
  
 
The file "rur.out" looks like this:
 
The file "rur.out" looks like this:

Latest revision as of 14:03, 2 February 2017

Resource Utilization Reporting (RUR) is a tool for gathering statistics on how system resources are being used by applications.

When RUR is enabled on a Cray system running CLE, resources utilization statistics are gathered from compute nodes. RUR runs primarily before the job has started and after it ends, ensuring minimal impact on performance.

AT HLRS RUR is configured to write a single file in user home directory: rur.out. The content of the file is the output of each plugin used by RUR. The plugins are: "taskstats", "energy" and "timestamp".

The "taskstats" prints the following information:

  • utime: User time [μsec] (accumulated over all processes)
  • stime: System time [μsec]
  • max_rss: Maximum memory used [KiB] (excluding hugepages)
  • rchar: Characters read by process
  • wchar: Characters written by process
  • exitcode: Lists all unique exit codes
  • core: Set to '1' if core dump occurred

The "energy" plugin prints the energy used by the job in Joules. And the "timestamp" plugin prints the moment job started and moment when it finished.

The file "rur.out" looks like this:

 hpcxmarc@eslogin007:~> cat rur.out
 uid: 28422, apid: 4451, jobid: 194328.hornet-tds-batch.hww.de, cmdname: ./xthi, plugin: taskstats ['utime', 380000, 'stime', 684000, 'max_rss', 3020, 'rchar', 1799329, 'wchar', 7722,'exitcode:signal', ['0:0'], 'core', 0]
 uid: 28422, apid: 4451, jobid: 194328.hornet-tds-batch.hww.de, cmdname: ./xthi, plugin: energy ['energy_used', 107]
 uid: 28422, apid: 4451, jobid: 194328.hornet-tds-batch.hww.de, cmdname: ./xthi, plugin: timestamp APP_START 2016-04-14T10:07:23CEST APP_STOP 2016-04-14T10:07:24CEST

Each job will append information to the "rur.out" file, hence the file can become quite large. But every line has the "jobid" identifier, so the user can search (grep) by the job id.

Reference S-2393, "CLE XC System Administration Guide"