- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Workflow for Profiling with mpiP

From HLRS Platforms
Revision as of 13:55, 15 May 2023 by Hpcjgrac (talk | contribs)
Jump to navigationJump to search

This page is work in progress!!!


Introduction

This page describes a basic workflow for performance analysis based on mpiP. The best-practises presented here are tailored to HLRS' Hawk system.

More specifically, we describe steps and commands necessary for

  1. setting up a suitable use-case,
  2. determining the non-instrumented performance,
  3. configuration of mpiP,
  4. obtaining profiles,
  5. determining instrumentation overhead,
  6. quick efficiency metrics.,

If you get stuck or need further explanation, please get in touch with HLRS user support.

On Hawk load the required modules with

$ module load mpip


Setting up a suitable use-case

In contrast to full tracing, profiling will in general not produce such huge amounts of performance analysis data. It is therefore in most cases not necessary to tailor use-cases for low amounts of performance trace data. Often, profiling can be done on the same configuration as production runs. In many cases however, users will still want to generate special use-cases of short duration for profiling in order to save compute resources.

However, the performance characteristics of a code depend critically on the scale, i.e. number of cores used, and the problem size. Try to keep you performance analysis use-case as close as possible to a realistic use-case of your interest. Where practical, reduce the execution time (and thus the tracing data volume) by reducing the amount of timesteps/iterations, not by reducing the problem size.

On the other hand, the number of timesteps/iterations should not be too small. For profiling in particular, it is important to make sure, that the total execution time is dominated by the main computational loop while the initialisation and shutdown phases take only a small fraction of the execution time. As a rule of thumb, aim for less than 5% init and shutdown phase where possible.