- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

CRAY XC40 Resource Utilization Reporting

From HLRS Platforms
Revision as of 16:42, 14 April 2016 by Hpcbk (talk | contribs) (Created page with "Resource Utilization Reporting (RUR) is a tool for gathering statistics on how system resources are being used by applications. When RUR is enabled on a Cray system running...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Resource Utilization Reporting (RUR) is a tool for gathering statistics on how system resources are being used by applications.

When RUR is enabled on a Cray system running CLE, resources utilization statistics are gathered from compute nodes. RUR runs primarily before the job has started and after it ends, ensuring minimal impact on performance.

AT HLRS RUR is configured to write a single file in user home directory: rur.out. The content of the file is the output of each plugin used by RUR. The plugins are: "taskstats", "energy" and "timestamp".

The "taskstats" prints the following information:

  • utime: User time
  • stime: System time
  • max_rss: Maximum memory used.
  • rchar: Characters read by process.
  • wchar: Characters written by process
  • exitcode: Lists all unique exit codes
  • core: Set to '1' if core dump occurred.

The "energy" plugin prints the energy used by the job in Joules. And the "timestamp" plugin prints the moment job started and moment when it finished.

The file "rur.out" looks like this:

 hpcxmarc@eslogin007:~> cat rur.out
 uid: 28422, apid: 4451, jobid: 194328.hornet-tds-batch.hww.de, cmdname: ./xthi, plugin: taskstats ['utime', 380000, 'stime', 684000, 'max_rss', 3020, 'rchar', 1799329, 'wchar', 7722,'exitcode:signal', ['0:0'], 'core', 0]
 uid: 28422, apid: 4451, jobid: 194328.hornet-tds-batch.hww.de, cmdname: ./xthi, plugin: energy ['energy_used', 107]
 uid: 28422, apid: 4451, jobid: 194328.hornet-tds-batch.hww.de, cmdname: ./xthi, plugin: timestamp APP_START 2016-04-14T10:07:23CEST APP_STOP 2016-04-14T10:07:24CEST

Each job will append information to the "rur.out" file, hence the file can become quite large. But every line has the "jobid" identifier, so the user can search (grep) by the job id.

Reference S-2393, "CLE XC System Administration Guide"