- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
Intel MPI: Difference between revisions
(Update basic documentation) |
No edit summary |
||
(2 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
{{Infobox software | {{Infobox software | ||
| description = '''Intel MPI''' Library focuses on making applications perform better on Intel architecture-based clusters—implementing the high performance MPI- | | description = '''Intel MPI''' Library focuses on making applications perform better on Intel architecture-based clusters—implementing the high performance MPI-3 specification on multiple fabrics. | ||
| developer = Intel | | developer = Intel | ||
| available on = [[NEC Nehalem Cluster]] | | available on = [[NEC Nehalem Cluster]] | ||
Line 33: | Line 33: | ||
This example shows how to run an application on 16 nodes, using 32 processes spawning 128 threads with | This example shows how to run an application on 16 nodes, using 32 processes spawning 128 threads with | ||
sets of 4 threads being pinned to a single CPU socket. This will give you optimum NUMA placement of processes and memory e.g. on the [[ | sets of 4 threads being pinned to a single CPU socket. This will give you optimum NUMA placement of processes and memory e.g. on the the nehalem nodes of the [[Laki]] system. | ||
Best use Intel MPI in combination with Intel compiler. | Best use Intel MPI in combination with Intel compiler. | ||
Line 50: | Line 50: | ||
Run the application using the thread_pin_wrapper.sh script shown below. | Run the application using the thread_pin_wrapper.sh script shown below. | ||
{{Command | {{Command | ||
| command = | | command = mpirun -f $PBS_NODEFILE -np 32 -perhost 2 -genv I_MPI_PIN_DOMAIN=auto -genv KMP_AFFINITY=verbose,scatter,granularity=thread ./thread_pin_wrapper.sh /absolute/path/to/your_app | ||
}} | }} | ||
Latest revision as of 12:14, 11 March 2015
Intel MPI Library focuses on making applications perform better on Intel architecture-based clusters—implementing the high performance MPI-3 specification on multiple fabrics. |
|
Examples
simple example
This example shows the basic steps when using Intel MPI.
Load the necessary module
Compile your application using the mpi wrapper compilers mpicc, mpicxx and mpif90.
Run your application
thread pinning
This example shows how to run an application on 16 nodes, using 32 processes spawning 128 threads with sets of 4 threads being pinned to a single CPU socket. This will give you optimum NUMA placement of processes and memory e.g. on the the nehalem nodes of the Laki system.
Best use Intel MPI in combination with Intel compiler.
Compile your application as shown in the simple example above.
Run the application using the thread_pin_wrapper.sh script shown below.
#!/bin/bash export KMP_AFFINITY=verbose,scatter # Intel specific environment variable export OMP_NUM_THREADS=4 RANK=${OMPI_COMM_WORLD_RANK:=$PMI_RANK} if [ $(expr $RANK % 2) = 0 ] then export GOMP_CPU_AFFINITY=0-3 numactl --preferred=0 --cpunodebind=0 $@ else export GOMP_CPU_AFFINITY=4-7 numactl --preferred=1 --cpunodebind=1 $@ fi