- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

NEC Cluster Using MPI: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
No edit summary
 
No edit summary
Line 19: Line 19:
  mpdboot  -n 16 -f m -r ssh   
  mpdboot  -n 16 -f m -r ssh   


Fouth step: start MPI application
Fourth step: start MPI application


  mpiexec -perhost 2 -genv I_MPI_PIN 0  -np 32 ./wrapper.sh ./yourGloriousApp
  mpiexec -perhost 2 -genv I_MPI_PIN 0  -np 32 ./wrapper.sh ./yourGloriousApp

Revision as of 19:37, 17 July 2009

Intel MPI example

As Nehalem system is a twosocket system with local attached ccNUMA memory, memory and process placmeent can be crucial.

Here is an example that shows a 16 node Job, using 1 process per socket and 4 threads per socket and optimum NUMA placement of processes and memory.

First step: Batch submit to get the nodes

 qsub -l nodes=16:nehalem:ppn=8,walltime=6:00:00 -I           # get the 16 nodes

Second step: make a hostlist

 sort -u  $PBS_NODEFILE  > m

Third step: make a process ring to be used by MPI later

mpdboot  -n 16 -f m -r ssh  

Fourth step: start MPI application

mpiexec -perhost 2 -genv I_MPI_PIN 0  -np 32 ./wrapper.sh ./yourGloriousApp

With wrapper.sh looking like this

#!/bin/bash
export KMP_AFFINITY=verbose,scatter
export OMP_NUM_THREADS=4
if [ $(expr $PMI_RANK % 2) = 0  ]
then
       export GOMP_CPU_AFFINITY=0-3
       numactl --preferred=0 --cpunodebind=0 $@
else
       export GOMP_CPU_AFFINITY=4-7
       numactl --preferred=1 --cpunodebind=1 $@
fi