- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -

Difference between revisions of "NEC Cluster Using MPI"

From HLRS Platforms
Jump to navigationJump to search
 
(16 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== OpenMPI example ==
+
=== OpenMPI ===
  
=== simple example ===
+
see [[Open MPI]]
  
To use OpenMPI with intel Compiler, create a .modulerc in your home
+
=== Intel MPI ===
with this contents:
 
  
#%Module1.0#
+
see [[Intel MPI]]
set version 1.0
 
module load compiler/intel/11.0
 
module load mpi/openmpi/1.3-intel-11.0
 
  
For compilationuse the mpi wrapper scripts like mpicc/mpic++/mpif90.
+
=== MVAPICH2 ===
  
The following example is for a pure MPI job, using 16 nodes (128 processes).
+
see [[MVAPICH2]]
For Illustration, this is done using an interactvie session (-I option)
 
  
First step: Batch submit to get the nodes
+
=== MPI I/O ===
  
  qsub -l nodes=16:nehalem:ppn=8,walltime=6:00:00 -I            # get the 16 nodes
+
see [[MPI-IO]]
 
 
In the session you will get after some time, the application is started with
 
 
mpirun -np 128 PathToYourApp
 
 
 
=== more complex examples ===
 
 
 
OpenMPI the resources in something called 'slots'.
 
By specifying 'ppn:X' to the batchsystem, the number of slots per node is specified.
 
So for a simple MPI job with 8 process per node (=1 process per core) ppn:8
 
is best choice, as in above example. Details can be specified on mpirun commandline.
 
PBS setup is adjusted for ppn:8, please do not use other values.
 
 
 
If you want to use less processes per node e.g. because you are restricted by memory per process,
 
or you have a hybrid parallel application using OpenMP and MPI,
 
MPI would always put the first 8 processes on the first node, second 8 on second and so on.
 
To avoid this, you can do
 
 
 
mpirun -np X -npernode 2 /path/to/app
 
 
 
This would start 2 processes per node. Like this, you can use a larger number of nodes
 
with a smaller number of processes, or you can e.g. start threads out of the processes.
 
 
 
If you want to pin your processes to a CPU (and enable NUMA memory affinity) use
 
 
 
 
 
mpirun -np X --mca mpi_paffinity_alone 1  /path/to/app
 
 
 
Warning: This will not behave as expected for hybrid multithreaded applications,
 
as the threads will be pinned to a single CPU as well! Use this only in case
 
of one process per core, no extra threads.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
== Intel MPI example ==
 
 
 
As Nehalem system is a two socket system with local attached ccNUMA memory,
 
memory and process placmeent can be crucial.
 
 
 
Here is an example that shows a 16 node Job, using 1 process per socket and 4 threads
 
per socket and optimum NUMA placement of processes and memory.
 
 
 
Prerequiste: Use intel MPI and best intel compiler
 
To setup environment for this, use this .modulerc file in your home:
 
 
 
#%Module1.0#
 
set version 1.0
 
module load compiler/intel/11.0
 
module load mpi/impi/intel-11.0.074-impi-3.2.0.011
 
 
 
And compile your application using mpicc/mpif90.
 
 
 
First step: Batch submit to get the nodes
 
 
 
  qsub -l nodes=16:nehalem:ppn=8,walltime=6:00:00 -I          # get the 16 nodes
 
 
 
Second step: make a hostlist
 
 
 
  sort -u  $PBS_NODEFILE  > m
 
 
 
Third step: make a process ring to be used by MPI later
 
 
 
mpdboot  -n 16 -f m -r ssh 
 
 
 
Fourth step: start MPI application
 
 
 
mpiexec -perhost 2 -genv I_MPI_PIN 0  -np 32 ./wrapper.sh ./yourGloriousApp
 
 
 
With wrapper.sh looking like this
 
 
 
#!/bin/bash
 
export KMP_AFFINITY=verbose,scatter
 
export OMP_NUM_THREADS=4
 
if [ $(expr $PMI_RANK % 2) = 0  ]
 
then
 
        export GOMP_CPU_AFFINITY=0-3
 
        numactl --preferred=0 --cpunodebind=0 $@
 
else
 
        export GOMP_CPU_AFFINITY=4-7
 
        numactl --preferred=1 --cpunodebind=1 $@
 
fi
 
 
 
 
 
Result is an application running on 16 nodes, using 32 processes spawning
 
128 threads. One set of 4 therads is pinned to the one socket, the other set of 4 threads to the other socket.
 

Latest revision as of 14:44, 12 June 2013

OpenMPI

see Open MPI

Intel MPI

see Intel MPI

MVAPICH2

see MVAPICH2

MPI I/O

see MPI-IO