- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

NEC Cluster Using MPI

From HLRS Platforms
Revision as of 14:17, 26 February 2010 by Hpcchris (talk | contribs) (Moved description for Open MPI to the corresponding page)
Jump to navigationJump to search

OpenMPI example

see Open MPI

Intel MPI example

simple example

Load the necessary modules

module load mpi/impi


Run your application with

mpirun -r ssh -np 8 your_app


more complex example

As Nehalem system is a two socket system with local attached ccNUMA memory, memory and process placement can be crucial.

Here is an example that shows a 16 node Job, using 1 process per socket and 4 threads per socket and optimum NUMA placement of processes and memory.

Prerequisite: Use intel MPI and best intel compiler To setup environment for this, use this .modulerc file in your home:

File: .modulerc
#%Module1.0#
set version 1.0
module load compiler/intel/11.0
module load mpi/impi/intel-11.0.074-impi-3.2.0.011


And compile your application using mpicc/mpicxx/mpif90 (GNU compiler) or mpiicc/mpiicpc/mpiifort (Intel compiler).

First step: Batch submit to get the nodes

qsub -l nodes=16:nehalem:ppn=8,walltime=6:00:00 -I # get the 16 nodes


Second step: make a hostlist

sort -u $PBS_NODEFILE > m


Third step: make a process ring to be used by MPI later

mpdboot -n 16 -f m -r ssh


Fourth step: start MPI application

mpiexec -perhost 2 -genv I_MPI_PIN 0 -np 32 ./wrapper.sh ./yourGloriousApp


With wrapper.sh looking like this

File: wrapper.sh
#!/bin/bash
export KMP_AFFINITY=verbose,scatter
export OMP_NUM_THREADS=4
RANK=${OMPI_COMM_WORLD_RANK:=$PMI_RANK}
if [ $(expr $RANK % 2) = 0  ]
then
     export GOMP_CPU_AFFINITY=0-3
     numactl --preferred=0 --cpunodebind=0 $@
else
     export GOMP_CPU_AFFINITY=4-7
     numactl --preferred=1 --cpunodebind=1 $@
fi


Result is an application running on 16 nodes, using 32 processes spawning 128 threads. One set of 4 threads is pinned to the one socket, the other set of 4 threads to the other socket.

MVAPICH2 example

simple example

Load the necessary module

module load mpi/mvapich2


Run your application with

mpirun_rsh -np 8 -hostfile $PBS_NODEFILE your_app