- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Open MPI: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
No edit summary
(Added usage examples)
Line 6: Line 6:
| license                = New BSD license
| license                = New BSD license
| website                = [http://www.open-mpi.org/ Open MPI homepage]  
| website                = [http://www.open-mpi.org/ Open MPI homepage]  
}}
== Examples ==
==== simple example ====
This example shows the basic steps when using Open MPI.
Load the necessary module
{{Command|command =
module load mpi/openmpi
}}
Compile your application using the mpi wrapper compilers mpicc, mpic++ or mpif90:
{{Command|command =
mpicc your_app.c - o your_app
}}
Now we run our application using 128 processes spread accros 16 nodes in an interactive job (-I option):
{{Command | command =
qsub -l nodes=16:ppn=8,walltime=6:00:00 -I            # get 16 nodes for 6 hours
mpirun -np 128 your_app                              # run your_app using 128 processes
}}
==== specifying the number of processes per node ====
Open MPI divides resources in something called 'slots'. By specifying <code>ppn:X</code> to the batchsystem, the number of slots per node is specified.
So for a simple MPI job with 8 process per node (=1 process per core) <code>ppn:8</code> is best choice, as in above example. Details can be specified on <code>mpirun</code> command line. PBS setup is adjusted for ppn:8, please do not use other values.
If you want to use less processes per node e.g. because you are restricted by memory requirements, or you have a hybrid parallel application using MPI and OpenMP, MPI would always put the first 8 processes on the first node, second 8 on second and so on. To avoid this, you can use the <code>-npernode</code> option. 
{{Command
| command = mpirun -np X -npernode 2 your_app
}}
This would start 2 processes per node. Like this, you can use a larger number of nodes
with a smaller number of processes, or you can e.g. start threads out of the processes.
=== process pinning ===
If you want to pin your processes to a CPU (and enable NUMA memory affinity) use
{{Command
| command = mpirun -np X --mca mpi_paffinity_alone 1 your_app
}}
{{Warning
| text = This will not behave as expected for hybrid multi threaded applications (MPI + OpenMP), as the threads will be pinned to a single CPU as well! Use this only if you want to pin one process per core - no extra threads!
}}
=== thread pinning ===
For pinning of hybrid MPI/OpenMP, use the following wrapper script
{{File|filename=thread_pin_wrapper.sh|content=<pre>
#!/bin/bash
export KMP_AFFINITY=verbose,scatter          # Intel specific environment variable
export OMP_NUM_THREADS=4
RANK=${OMPI_COMM_WORLD_RANK:=$PMI_RANK}
if [ $(expr $RANK % 2) = 0  ]
then
    export GOMP_CPU_AFFINITY=0-3
    numactl --preferred=0 --cpunodebind=0 $@
else
    export GOMP_CPU_AFFINITY=4-7
    numactl --preferred=1 --cpunodebind=1 $@
fi
</pre>
}}
Run your application with the following command
{{Command
| command = mpirun -np X -npernode 2 thread_pin_wrapper.sh your_app
}}
{{Warning| text =
Do not use the mpi_paffinity_alone option in this case!
}}
}}



Revision as of 14:15, 26 February 2010

Open MPI is an Message Passing Interface (MPI) library project combining technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI).
Developer: Open MPI Development Team
Platforms: NEC Nehalem Cluster
Category: MPI
License: New BSD license
Website: Open MPI homepage


Examples

simple example

This example shows the basic steps when using Open MPI.

Load the necessary module

module load mpi/openmpi


Compile your application using the mpi wrapper compilers mpicc, mpic++ or mpif90:

mpicc your_app.c - o your_app


Now we run our application using 128 processes spread accros 16 nodes in an interactive job (-I option):

qsub -l nodes=16:ppn=8,walltime=6:00:00 -I # get 16 nodes for 6 hours mpirun -np 128 your_app # run your_app using 128 processes


specifying the number of processes per node

Open MPI divides resources in something called 'slots'. By specifying ppn:X to the batchsystem, the number of slots per node is specified. So for a simple MPI job with 8 process per node (=1 process per core) ppn:8 is best choice, as in above example. Details can be specified on mpirun command line. PBS setup is adjusted for ppn:8, please do not use other values.

If you want to use less processes per node e.g. because you are restricted by memory requirements, or you have a hybrid parallel application using MPI and OpenMP, MPI would always put the first 8 processes on the first node, second 8 on second and so on. To avoid this, you can use the -npernode option.

mpirun -np X -npernode 2 your_app

This would start 2 processes per node. Like this, you can use a larger number of nodes with a smaller number of processes, or you can e.g. start threads out of the processes.


process pinning

If you want to pin your processes to a CPU (and enable NUMA memory affinity) use

mpirun -np X --mca mpi_paffinity_alone 1 your_app


Warning: This will not behave as expected for hybrid multi threaded applications (MPI + OpenMP), as the threads will be pinned to a single CPU as well! Use this only if you want to pin one process per core - no extra threads!


thread pinning

For pinning of hybrid MPI/OpenMP, use the following wrapper script

File: thread_pin_wrapper.sh
#!/bin/bash
export KMP_AFFINITY=verbose,scatter           # Intel specific environment variable
export OMP_NUM_THREADS=4

RANK=${OMPI_COMM_WORLD_RANK:=$PMI_RANK}
if [ $(expr $RANK % 2) = 0  ]
then
     export GOMP_CPU_AFFINITY=0-3
     numactl --preferred=0 --cpunodebind=0 $@
else
     export GOMP_CPU_AFFINITY=4-7
     numactl --preferred=1 --cpunodebind=1 $@
fi


Run your application with the following command

mpirun -np X -npernode 2 thread_pin_wrapper.sh your_app


Warning: Do not use the mpi_paffinity_alone option in this case!


See also

External links