- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

NEC Cluster Using MPI: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
No edit summary
 
(22 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== Intel MPI example ==
=== OpenMPI ===


As Nehalem system is a twosocket system with local attached ccNUMA memory,
see [[Open MPI]]
memory and process placmeent can be crucial.


Here is an example that shows a 16 node Job, using 1 process per socket and 4 threads
=== Intel MPI ===
per socket and optimum NUMA placement of processes and memory.


First step: Batch submit to get the nodes
see [[Intel MPI]]


  qsub -l nodes=16:nehalem:ppn=8,walltime=6:00:00 -I          # get the 16 nodes
=== MVAPICH2 ===


Second step: make a hostlist
see [[MVAPICH2]]


  sort -u  $PBS_NODEFILE  > m
=== MPI I/O ===


Third step: make a process ring to be used by MPI later
see [[MPI-IO]]
 
mpdboot  -n 16 -f m -r ssh 
 
Fourth step: start MPI application
 
mpiexec -perhost 2 -genv I_MPI_PIN 0  -np 32 ./wrapper.sh ./yourGloriousApp
 
With wrapper.sh looking like this
 
#!/bin/bash
export KMP_AFFINITY=verbose,scatter
export OMP_NUM_THREADS=4
if [ $(expr $PMI_RANK % 2) = 0  ]
then
        export GOMP_CPU_AFFINITY=0-3
        numactl --preferred=0 --cpunodebind=0 $@
else
        export GOMP_CPU_AFFINITY=4-7
        numactl --preferred=1 --cpunodebind=1 $@
fi
 
 
Result is an application running on 16 nodes, using 32 processes spawning
128 threads. One set of 4 therads is pinned to the one socket, the other set of 4 threads to the other socket.

Latest revision as of 14:44, 12 June 2013

OpenMPI

see Open MPI

Intel MPI

see Intel MPI

MVAPICH2

see MVAPICH2

MPI I/O

see MPI-IO