- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

NEC Cluster Using MPI: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
(Moved description for Intel MPI to the corresponding page)
Line 3: Line 3:
see [[Open MPI]]
see [[Open MPI]]


=== Intel MPI example ===
see [[Intel MPI]]
 
==== simple example ====
 
Load the necessary modules
{{Command
| command = module load mpi/impi
}}
 
Run your application with
{{Command
| command = mpirun -r ssh -np 8 your_app
}}
 
==== more complex example ====
 
As Nehalem system is a two socket system with local attached ccNUMA memory,
memory and process placement can be crucial.
 
Here is an example that shows a 16 node Job, using 1 process per socket and 4 threads
per socket and optimum NUMA placement of processes and memory.
 
Prerequisite: Use intel MPI and best intel compiler
To setup environment for this, use this .modulerc file in your home:
 
{{File
| filename = .modulerc
| content = <pre>
#%Module1.0#
set version 1.0
module load compiler/intel/11.0
module load mpi/impi/intel-11.0.074-impi-3.2.0.011
</pre>
}}
 
And compile your application using mpicc/mpicxx/mpif90 (GNU compiler) or mpiicc/mpiicpc/mpiifort (Intel compiler).
 
First step: Batch submit to get the nodes
 
{{Command
| command = qsub -l nodes=16:nehalem:ppn=8,walltime=6:00:00 -I          # get the 16 nodes
}}
 
Second step: make a hostlist
{{Command
| command = sort -u  $PBS_NODEFILE  > m
}}
 
Third step: make a process ring to be used by MPI later
{{Command
| command = mpdboot  -n 16 -f m -r ssh 
}}
 
Fourth step: start MPI application
{{Command
| command = mpiexec -perhost 2 -genv I_MPI_PIN 0  -np 32 ./wrapper.sh ./yourGloriousApp
}}
 
With wrapper.sh looking like this
 
{{File
| filename = wrapper.sh
| content =<pre>
#!/bin/bash
export KMP_AFFINITY=verbose,scatter
export OMP_NUM_THREADS=4
RANK=${OMPI_COMM_WORLD_RANK:=$PMI_RANK}
if [ $(expr $RANK % 2) = 0  ]
then
    export GOMP_CPU_AFFINITY=0-3
    numactl --preferred=0 --cpunodebind=0 $@
else
    export GOMP_CPU_AFFINITY=4-7
    numactl --preferred=1 --cpunodebind=1 $@
fi
</pre>
}}
 
Result is an application running on 16 nodes, using 32 processes spawning
128 threads. One set of 4 threads is pinned to the one socket, the other set of 4 threads to the other socket.


=== MVAPICH2 example ===
=== MVAPICH2 example ===


see [[MVAPICH2]]
see [[MVAPICH2]]

Revision as of 18:05, 26 February 2010

OpenMPI example

see Open MPI

see Intel MPI

MVAPICH2 example

see MVAPICH2