- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -

CRAY XC40 Resource Utilization Reporting

From HLRS Platforms
Revision as of 14:03, 2 February 2017 by Hpcbern (talk | contribs) (added units)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Resource Utilization Reporting (RUR) is a tool for gathering statistics on how system resources are being used by applications.

When RUR is enabled on a Cray system running CLE, resources utilization statistics are gathered from compute nodes. RUR runs primarily before the job has started and after it ends, ensuring minimal impact on performance.

AT HLRS RUR is configured to write a single file in user home directory: rur.out. The content of the file is the output of each plugin used by RUR. The plugins are: "taskstats", "energy" and "timestamp".

The "taskstats" prints the following information:

  • utime: User time [μsec] (accumulated over all processes)
  • stime: System time [μsec]
  • max_rss: Maximum memory used [KiB] (excluding hugepages)
  • rchar: Characters read by process
  • wchar: Characters written by process
  • exitcode: Lists all unique exit codes
  • core: Set to '1' if core dump occurred

The "energy" plugin prints the energy used by the job in Joules. And the "timestamp" plugin prints the moment job started and moment when it finished.

The file "rur.out" looks like this:

 hpcxmarc@eslogin007:~> cat rur.out
 uid: 28422, apid: 4451, jobid: 194328.hornet-tds-batch.hww.de, cmdname: ./xthi, plugin: taskstats ['utime', 380000, 'stime', 684000, 'max_rss', 3020, 'rchar', 1799329, 'wchar', 7722,'exitcode:signal', ['0:0'], 'core', 0]
 uid: 28422, apid: 4451, jobid: 194328.hornet-tds-batch.hww.de, cmdname: ./xthi, plugin: energy ['energy_used', 107]
 uid: 28422, apid: 4451, jobid: 194328.hornet-tds-batch.hww.de, cmdname: ./xthi, plugin: timestamp APP_START 2016-04-14T10:07:23CEST APP_STOP 2016-04-14T10:07:24CEST

Each job will append information to the "rur.out" file, hence the file can become quite large. But every line has the "jobid" identifier, so the user can search (grep) by the job id.

Reference S-2393, "CLE XC System Administration Guide"