- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

VTune: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
No edit summary
No edit summary
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{Infobox software
{{Infobox software
| description = Intel® '''VTune™ Profiler ''' is a performance analysis tool for serial and multithreaded applications. Use VTune Profiler:
| description = Intel® '''VTune™ Profiler ''' is an accurate performance analysis tool with low overhead for serial and multithreaded applications. Use VTune Profiler:
* to determine the most time-consuming (hot) functions in your application
* to determine the most time-consuming (hot) functions in your application
* to discover on very fine-grained level which section of code (loop, source code line or even data field) do not effectively utilize available processor time
* to discover on very fine-grained level which section of code (loop, source code line or even data field) do not effectively utilize available processor time
* to analyse communication behavior that affects threaded performance
* to analyse serial and multithreaded applications  
* for accurate analysis with low overhead
* for serial and multithreaded applications  
** MPI (MPT, OpenMPI)
** MPI (MPT, OpenMPI)
** OpenMP, Intel® oneAPI Threading Building Blocks, native threads
** OpenMP, Intel® oneAPI Threading Building Blocks, native threads
* Languages:
* for applications developed in:
** C/C++/C#, Fortran, Python and others
** C/C++/C#, Fortran, Python and others


(Formerly Intel® VTune™ Performance Analyzer with Intel® Thread Profiler)
| logo = [[Image:intel-logo.png]]
| logo = [[Image:intel-logo.png]]
| developer              = Intel
| developer              = Intel
| available on      =  
| available on      = Hawk, Vulcan
| category                  = [[:Category:Performance Analyzer | Performance Analyzer]]
| category                  = [[:Category:Performance Analyzer | Performance Analyzer]]
| license                = Commercial
| license                = Commercial
| website                = [http://software.intel.com/en-us/intel-vtune/ Intel® VTune™ Amplifier XE homepage]  
| website                = [https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/vtune-profiler.html Intel® VTune™ Profiler homepage]  
}}  
}}  


=== Using Intel VTune on Nehalem cluster ===
=== Using Intel VTune ===
To perform the performance analyse of your application with VTune you don’t need special compiler wrapper or libraries. Just recompile and relink your code with extra –g option in order to include debug information. VTune works well for dynamically linked binaries. [https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/set-up-analysis-target/linux-targets/analyzing-statically-linked-binaries-on-linux-targets.html Here] you can find some tips for statically linked binaries.


Load the necessary module. For example:
Example. Modules on Hawk:
<pre>
<pre>
module load compiler/intel
module load vtune # set up VTune environment  
module load performance/vtune # set up VTune environment
module load gcc mpt
module load mpi/impi          # if MPI needed
</pre>
</pre>


Compilation example:  
Example. Modules on Vulcan:
<pre>
<pre>
ifort -g -O2 prog.f90
module load performance/vtune/2020.1 # set up VTune environment
module load mpi/openmpi/4.1.1-gnu-11.1.0
</pre>
</pre>


Analyzing MPI applications:
Compilation example:  
<pre>
<pre>
mpirun –n4 amplxe-cl -c hotspots –r my_result -- path_to_my_app
mpicxx –O2 -g -Wl,-Bdynamic main.cpp
</pre>
</pre>


=== Using Intel VTune on Cray machines ===
'''Run analysis'''
 
VTune has both a GUI and command line tool: vtune-gui and vtune.
The following types of analysis are available on Hawk:
* ''hotspots'' - Analyze application flow and identify sections of code that take a long time to execute (hotspots).
* ''threading'' - Discover how well your application is using parallelism to take advantage of all available CPUs. Identify and locate synchronization issues causing overhead or idle wait time resulting in lost performance.
* ''memory-consumption'' - Analyze memory consumption by your Linux application, its distinct memory objects and their allocation stacks.
 
'''IMPORTANT''' ''The VTune project working directory and the results directory must be placed on lustre FS.''


Load the necessary module. For example:
Example for your job script:
<pre>
<pre>
module swap PrgEnv-cray PrgEnv-intel # set up MPI environment too
..
module load performance/vtune       # set up VTune environment
module load vtune
module load gcc mpt
WORKDIR=/your/project/dir/on/lustre
cd ${WORKDIR}
mpirun -np 128  vtune -collect hotspots -r ${WORKDIR}/results_dir -- ./a.out your_input.file
</pre>
</pre>


Compilation example:  
'''Report'''
 
When VЕune completes the analysis, you can open the results in the vtune-gui tool. Alternatively you can also generate the report in text form using the VTune command line tool:
<pre>
<pre>
ftn -dynamic -g -O2 prog.f90
vtune -help report
vtune -report summary -r ${WORKDIR}/results_dir
</pre>
</pre>


Analyzing MPI applications:
For some use cases you might need to limit the amount of raw data to be collected. Define this limit in MB through the data-limit option:
<pre>
<pre>
aprun –n4 amplxe-cl -c hotspots –r myResult-@@@{at} -- path_to_my_app
mpirun -np 128 vtune -collect hotspots -data-limit=200 -- ./a.out
</pre>
</pre>
Some more information about VTune you can find [https://kb.hlrs.de/platforms/upload/Vtune_perf_analysis.pdf here].


== See also ==
== See also ==
Line 61: Line 76:


== External links ==
== External links ==
* [http://software.intel.com/en-us/intel-vtune/ Intel® VTune™ Amplifier XE homepage]
* [https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/vtune-profiler.html Intel® VTune™ Profiler homepage]
 
* [https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/introduction.html Introduction to Intel® VTune™ Profiler]
[[Category:Performance Analyzer]]
[[Category:Performance Analyzer]]

Latest revision as of 17:18, 4 June 2021

Intel® VTune™ Profiler is an accurate performance analysis tool with low overhead for serial and multithreaded applications. Use VTune Profiler:
  • to determine the most time-consuming (hot) functions in your application
  • to discover on very fine-grained level which section of code (loop, source code line or even data field) do not effectively utilize available processor time
  • to analyse serial and multithreaded applications
    • MPI (MPT, OpenMPI)
    • OpenMP, Intel® oneAPI Threading Building Blocks, native threads
  • for applications developed in:
    • C/C++/C#, Fortran, Python and others
Intel-logo.png
Developer: Intel
Platforms: Hawk, Vulcan
Category: Performance Analyzer
License: Commercial
Website: Intel® VTune™ Profiler homepage


Using Intel VTune

To perform the performance analyse of your application with VTune you don’t need special compiler wrapper or libraries. Just recompile and relink your code with extra –g option in order to include debug information. VTune works well for dynamically linked binaries. Here you can find some tips for statically linked binaries.

Example. Modules on Hawk:

module load vtune # set up VTune environment 
module load gcc mpt 

Example. Modules on Vulcan:

module load performance/vtune/2020.1 # set up VTune environment 
module load mpi/openmpi/4.1.1-gnu-11.1.0 

Compilation example:

mpicxx –O2 -g -Wl,-Bdynamic main.cpp 

Run analysis

VTune has both a GUI and command line tool: vtune-gui and vtune. The following types of analysis are available on Hawk:

  • hotspots - Analyze application flow and identify sections of code that take a long time to execute (hotspots).
  • threading - Discover how well your application is using parallelism to take advantage of all available CPUs. Identify and locate synchronization issues causing overhead or idle wait time resulting in lost performance.
  • memory-consumption - Analyze memory consumption by your Linux application, its distinct memory objects and their allocation stacks.

IMPORTANT The VTune project working directory and the results directory must be placed on lustre FS.

Example for your job script:

..
module load vtune
module load gcc mpt
WORKDIR=/your/project/dir/on/lustre
cd ${WORKDIR}
mpirun -np 128  vtune -collect hotspots -r ${WORKDIR}/results_dir -- ./a.out your_input.file

Report

When VЕune completes the analysis, you can open the results in the vtune-gui tool. Alternatively you can also generate the report in text form using the VTune command line tool:

vtune -help report 
vtune -report summary -r ${WORKDIR}/results_dir

For some use cases you might need to limit the amount of raw data to be collected. Define this limit in MB through the data-limit option:

mpirun -np 128 vtune -collect hotspots -data-limit=200 -- ./a.out

Some more information about VTune you can find here.

See also

External links