- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

User monitoring

From HLRS Platforms
Revision as of 12:14, 1 August 2024 by Hpcchsim (talk | contribs)
Jump to navigationJump to search

Long term aggregations for end users =

How is data aggregated (default):

  1. Collection of data over a timebucket and depending on each metric perform a calculation to get new metric. The formula for each metric follow the formulas from: https://github.com/RRZE-HPC/likwid/tree/master/groups/zen2
  2. Calculate median, min, max of each node or CPU if relevant.
  3. Calculate average and standard deviation of median, 10th percentile of min and 90th percentile of max.


Aggregated metrics:

  • Bandwidth: Total memory bandwidth on socket basis. The two memory controller of a socket are added together. The data is saved in Bytes/s in the TimescaleDB and in GBytes/s in the JSON for Users.
  • Flops: Amount of floating point operations per second. The group does not differentiate between singe and double point precision rate. The data is saved in MFlops/s in
  • Energy sum: This is the total amount of energy consumed by each node. It’s measured in W. It is calculated by first adding the rapl counters of both sockets together and then adding a constant of 220 W for the energy consumption of other parts of the node.