- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Intel MPI: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
No edit summary
No edit summary
 
(9 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{Infobox software
{{Infobox software
| description =  '''Intel MPI''' Library 3.2 focuses on making applications perform better on Intel architecture-based clusters—implementing the high performance MPI-2 specification on multiple fabrics.
| description =  '''Intel MPI''' Library focuses on making applications perform better on Intel architecture-based clusters—implementing the high performance MPI-3 specification on multiple fabrics.
| developer              = Intel
| developer              = Intel
| available on      = [[NEC Nehalem Cluster]]
| available on      = [[NEC Nehalem Cluster]]
Line 6: Line 6:
| license                = Commercial
| license                = Commercial
| website                = [http://software.intel.com/en-us/intel-mpi-library/ Intel MPI homepage]  
| website                = [http://software.intel.com/en-us/intel-mpi-library/ Intel MPI homepage]  
}}
== Examples ==
==== simple example ====
This example shows the basic steps when using Intel MPI.
Load the necessary module
{{Command
| command = module load mpi/impi
}}
Compile your application using the mpi wrapper compilers <tt>mpicc</tt>, <tt>mpicxx</tt> and <tt>mpif90</tt>.
{{Note|text =
To use the Intel Compiler in combination with Intel MPI load <tt>mpi/impi</tt> and the <tt>compiler/intel</tt> module.
For compilation call the Intel specific wrapper compilers <tt>mpiicc</tt>, <tt>mpiicpc</tt> and <tt>mpiifort</tt>.
}}
Run your application
{{Command | command =
mpirun -np 8 /path/to/your_app/your_app
}}
==== thread pinning ====
This example shows how to run an application on 16 nodes, using 32 processes spawning 128 threads with
sets of 4 threads being pinned to a single CPU socket. This will give you optimum NUMA placement of processes and memory e.g. on the the nehalem nodes of the [[Laki]] system.
Best use Intel MPI in combination with Intel compiler.
{{Command| command =
module load compiler/intel
module load mpi/impi
}}
Compile your application as shown in the simple example above.
{{Command | command = <nowiki>qsub -l nodes=16:ppn=8,walltime=6:00:00 -I          # get 16 nodes for interactive usage
sort -u  $PBS_NODEFILE  > m                          # generate a hostlist
mpdboot  -n 16 -f m -r ssh                          # build a process ring to be used by MPI later
</nowiki>}}
Run the application using the thread_pin_wrapper.sh script shown below.
{{Command
| command = mpirun -f $PBS_NODEFILE -np 32 -perhost 2 -genv I_MPI_PIN_DOMAIN=auto -genv KMP_AFFINITY=verbose,scatter,granularity=thread ./thread_pin_wrapper.sh /absolute/path/to/your_app
}}
{{File | filename = thread_pin_wrapper.sh| content =<pre>
#!/bin/bash
export KMP_AFFINITY=verbose,scatter          # Intel specific environment variable
export OMP_NUM_THREADS=4
RANK=${OMPI_COMM_WORLD_RANK:=$PMI_RANK}
if [ $(expr $RANK % 2) = 0  ]
then
    export GOMP_CPU_AFFINITY=0-3
    numactl --preferred=0 --cpunodebind=0 $@
else
    export GOMP_CPU_AFFINITY=4-7
    numactl --preferred=1 --cpunodebind=1 $@
fi
</pre>
}}
}}


Line 13: Line 75:
== External links ==
== External links ==
* [http://software.intel.com/en-us/intel-mpi-library/ Intel MPI homepage]
* [http://software.intel.com/en-us/intel-mpi-library/ Intel MPI homepage]
* [http://software.intel.com/en-us/articles/intel-mpi-library-documentation/ Intel MPI documentation]


[[Category:MPI]]
[[Category:MPI]]

Latest revision as of 12:14, 11 March 2015

Intel MPI Library focuses on making applications perform better on Intel architecture-based clusters—implementing the high performance MPI-3 specification on multiple fabrics.
Developer: Intel
Platforms: NEC Nehalem Cluster
Category: MPI
License: Commercial
Website: Intel MPI homepage


Examples

simple example

This example shows the basic steps when using Intel MPI.

Load the necessary module

module load mpi/impi


Compile your application using the mpi wrapper compilers mpicc, mpicxx and mpif90.

Note: To use the Intel Compiler in combination with Intel MPI load mpi/impi and the compiler/intel module. For compilation call the Intel specific wrapper compilers mpiicc, mpiicpc and mpiifort.


Run your application

mpirun -np 8 /path/to/your_app/your_app


thread pinning

This example shows how to run an application on 16 nodes, using 32 processes spawning 128 threads with sets of 4 threads being pinned to a single CPU socket. This will give you optimum NUMA placement of processes and memory e.g. on the the nehalem nodes of the Laki system.

Best use Intel MPI in combination with Intel compiler.

module load compiler/intel module load mpi/impi


Compile your application as shown in the simple example above.

qsub -l nodes=16:ppn=8,walltime=6:00:00 -I # get 16 nodes for interactive usage sort -u $PBS_NODEFILE > m # generate a hostlist mpdboot -n 16 -f m -r ssh # build a process ring to be used by MPI later


Run the application using the thread_pin_wrapper.sh script shown below.

mpirun -f $PBS_NODEFILE -np 32 -perhost 2 -genv I_MPI_PIN_DOMAIN=auto -genv KMP_AFFINITY=verbose,scatter,granularity=thread ./thread_pin_wrapper.sh /absolute/path/to/your_app


File: thread_pin_wrapper.sh
#!/bin/bash
export KMP_AFFINITY=verbose,scatter           # Intel specific environment variable
export OMP_NUM_THREADS=4

RANK=${OMPI_COMM_WORLD_RANK:=$PMI_RANK}
if [ $(expr $RANK % 2) = 0  ]
then
     export GOMP_CPU_AFFINITY=0-3
     numactl --preferred=0 --cpunodebind=0 $@
else
     export GOMP_CPU_AFFINITY=4-7
     numactl --preferred=1 --cpunodebind=1 $@
fi


See also

External links