NEC Cluster Using MPI: Difference between revisions

Latest revision as of 14:44, 12 June 2013

@@ Line 1: / Line 1: @@
-== Intel MPI example ==
+=== OpenMPI ===
-As Nehalem system is a twosocket system with local attached ccNUMA memory,
+see [[Open MPI]]
-memory and process placmeent can be crucial.
-Here is an example that shows a 16 node Job, using 1 process per socket and 4 threads
+=== Intel MPI ===
-per socket and optimum NUMA placement of processes and memory.
-First step: Batch submit to get the nodes
+see [[Intel MPI]]
-  qsub -l nodes=16:nehalem:ppn=8,walltime=6:00:00 -I           # get the 16 nodes
+=== MVAPICH2 ===
-Second step: make a hostlist
+see [[MVAPICH2]]
-  sort -u  $PBS_NODEFILE  > m
+=== MPI I/O ===
-Third step: make a process ring to be used by MPI later
+see [[MPI-IO]]
- mpdboot  -n 16 -f m -r ssh
-Fourth step: start MPI application
- mpiexec -perhost 2 -genv I_MPI_PIN 0  -np 32 ./wrapper.sh ./yourGloriousApp
-With wrapper.sh looking like this
- #!/bin/bash
- export KMP_AFFINITY=verbose,scatter
- export OMP_NUM_THREADS=4
- if [ $(expr $PMI_RANK % 2) = 0  ]
- then
-        export GOMP_CPU_AFFINITY=0-3
-        numactl --preferred=0 --cpunodebind=0 $@
- else
-        export GOMP_CPU_AFFINITY=4-7
-        numactl --preferred=1 --cpunodebind=1 $@
- fi
-Result is an application running on 16 nodes, using 32 processes spawning
-threads. One set of 4 therads is pinned to the one socket, the other set of 4 threads to the other socket.