Intel MPI: Difference between revisions

Revision as of 18:05, 26 February 2010

Intel MPI Library 3.2 focuses on making applications perform better on Intel architecture-based clusters—implementing the high performance MPI-2 specification on multiple fabrics.


Developer:	Intel
Platforms:	NEC Nehalem Cluster
Category:	MPI
License:	Commercial
Website:	Intel MPI homepage

Examples

simple example

This example shows the basic steps when using Intel MPI.

Load the necessary modules

module load mpi/impi

Compile your application using the mpi wrapper compilers mpicc, mpicxx and mpif90.

Note: You will not find a Intel Compiler version for Intel MPI on most systems. To use Intel MPI in combination with the Intel Compiler you must load the compiler/intel module and call the Intel specific wrapper compilers mpiicc, mpiicpc and mpiifort.

Run your application

mpirun -r ssh -np 8 your_app

thread pinning

This example shows how to run an application on 16 nodes, using 32 processes spawning 128 threads with sets of 4 threads being pinned to a single CPU socket. This will give you optimum NUMA placement of processes and memory e.g. on the NEC Nehalem Cluster.

Best use Intel MPI in combination with Intel compiler.

module load compiler/intel
module load mpi/impi

Compile your application as shown in the simple example above.

qsub -l nodes=16:ppn=8,walltime=6:00:00 -I           # get 16 nodes for interactive usage
sort -u  $PBS_NODEFILE  > m                          # generate a hostlist

mpdboot  -n 16 -f m -r ssh                           # build a process ring to be used by MPI later

Run the application using the thread_pin_wrapper.sh script shown below.

mpiexec -perhost 2 -genv I_MPI_PIN 0  -np 32 ./thread_pin_wrapper.sh /absolute/path/to/your_app

File: thread_pin_wrapper.sh

#!/bin/bash
export KMP_AFFINITY=verbose,scatter           # Intel specific environment variable
export OMP_NUM_THREADS=4

RANK=${OMPI_COMM_WORLD_RANK:=$PMI_RANK}
if [ $(expr $RANK % 2) = 0  ]
then
     export GOMP_CPU_AFFINITY=0-3
     numactl --preferred=0 --cpunodebind=0 $@
else
     export GOMP_CPU_AFFINITY=4-7
     numactl --preferred=1 --cpunodebind=1 $@
fi

External links

Intel MPI homepage

Intel MPI: Difference between revisions

Revision as of 18:05, 26 February 2010

Contents

Examples

simple example

thread pinning

See also

External links

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

@@ Line 6: / Line 6: @@
 | license                = Commercial
 | website                = [http://software.intel.com/en-us/intel-mpi-library/ Intel MPI homepage]
+}}
+== Examples ==
+==== simple example ====
+This example shows the basic steps when using Intel MPI.
+Load the necessary modules
+{{Command
+| command = module load mpi/impi
+}}
+Compile your application using the mpi wrapper compilers <tt>mpicc</tt>, <tt>mpicxx</tt> and <tt>mpif90</tt>.
+{{Note|text =
+You will not find a Intel Compiler version for Intel MPI on most systems. To use Intel MPI in combination with the Intel Compiler you must load the <tt>compiler/intel</tt> module and call the Intel specific wrapper compilers <tt>mpiicc</tt>, <tt>mpiicpc</tt> and <tt>mpiifort</tt>.
+}}
+Run your application
+{{Command | command =
+mpirun -r ssh -np 8 your_app
+}}
+==== thread pinning ====
+This example shows how to run an application on 16 nodes, using 32 processes spawning 128 threads with
+sets of 4 threads being pinned to a single CPU socket. This will give you optimum NUMA placement of processes and memory e.g. on the [[NEC Nehalem Cluster]].
+Best use Intel MPI in combination with Intel compiler.
+{{Command| command =
+module load compiler/intel
+module load mpi/impi
+}}
+Compile your application as shown in the simple example above.
+{{Command
+| command =
+qsub -l nodes=16:ppn=8,walltime=6:00:00 -I           # get 16 nodes for interactive usage
+sort -u  $PBS_NODEFILE  > m                          # generate a hostlist
+mpdboot  -n 16 -f m -r ssh                           # build a process ring to be used by MPI later
+}}
+Run the application using the thread_pin_wrapper.sh script shown below.
+{{Command
+| command = mpiexec -perhost 2 -genv I_MPI_PIN 0  -np 32 ./thread_pin_wrapper.sh /absolute/path/to/your_app
+}}
+{{File | filename = thread_pin_wrapper.sh| content =<pre>
+#!/bin/bash
+export KMP_AFFINITY=verbose,scatter           # Intel specific environment variable
+export OMP_NUM_THREADS=4
+RANK=${OMPI_COMM_WORLD_RANK:=$PMI_RANK}
+if [ $(expr $RANK % 2) = 0  ]
+then
+     export GOMP_CPU_AFFINITY=0-3
+     numactl --preferred=0 --cpunodebind=0 $@
+else
+     export GOMP_CPU_AFFINITY=4-7
+     numactl --preferred=1 --cpunodebind=1 $@
+fi
+</pre>
 }}