- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
Open MPI: Difference between revisions
No edit summary |
(Added usage examples) |
||
Line 6: | Line 6: | ||
| license = New BSD license | | license = New BSD license | ||
| website = [http://www.open-mpi.org/ Open MPI homepage] | | website = [http://www.open-mpi.org/ Open MPI homepage] | ||
}} | |||
== Examples == | |||
==== simple example ==== | |||
This example shows the basic steps when using Open MPI. | |||
Load the necessary module | |||
{{Command|command = | |||
module load mpi/openmpi | |||
}} | |||
Compile your application using the mpi wrapper compilers mpicc, mpic++ or mpif90: | |||
{{Command|command = | |||
mpicc your_app.c - o your_app | |||
}} | |||
Now we run our application using 128 processes spread accros 16 nodes in an interactive job (-I option): | |||
{{Command | command = | |||
qsub -l nodes=16:ppn=8,walltime=6:00:00 -I # get 16 nodes for 6 hours | |||
mpirun -np 128 your_app # run your_app using 128 processes | |||
}} | |||
==== specifying the number of processes per node ==== | |||
Open MPI divides resources in something called 'slots'. By specifying <code>ppn:X</code> to the batchsystem, the number of slots per node is specified. | |||
So for a simple MPI job with 8 process per node (=1 process per core) <code>ppn:8</code> is best choice, as in above example. Details can be specified on <code>mpirun</code> command line. PBS setup is adjusted for ppn:8, please do not use other values. | |||
If you want to use less processes per node e.g. because you are restricted by memory requirements, or you have a hybrid parallel application using MPI and OpenMP, MPI would always put the first 8 processes on the first node, second 8 on second and so on. To avoid this, you can use the <code>-npernode</code> option. | |||
{{Command | |||
| command = mpirun -np X -npernode 2 your_app | |||
}} | |||
This would start 2 processes per node. Like this, you can use a larger number of nodes | |||
with a smaller number of processes, or you can e.g. start threads out of the processes. | |||
=== process pinning === | |||
If you want to pin your processes to a CPU (and enable NUMA memory affinity) use | |||
{{Command | |||
| command = mpirun -np X --mca mpi_paffinity_alone 1 your_app | |||
}} | |||
{{Warning | |||
| text = This will not behave as expected for hybrid multi threaded applications (MPI + OpenMP), as the threads will be pinned to a single CPU as well! Use this only if you want to pin one process per core - no extra threads! | |||
}} | |||
=== thread pinning === | |||
For pinning of hybrid MPI/OpenMP, use the following wrapper script | |||
{{File|filename=thread_pin_wrapper.sh|content=<pre> | |||
#!/bin/bash | |||
export KMP_AFFINITY=verbose,scatter # Intel specific environment variable | |||
export OMP_NUM_THREADS=4 | |||
RANK=${OMPI_COMM_WORLD_RANK:=$PMI_RANK} | |||
if [ $(expr $RANK % 2) = 0 ] | |||
then | |||
export GOMP_CPU_AFFINITY=0-3 | |||
numactl --preferred=0 --cpunodebind=0 $@ | |||
else | |||
export GOMP_CPU_AFFINITY=4-7 | |||
numactl --preferred=1 --cpunodebind=1 $@ | |||
fi | |||
</pre> | |||
}} | |||
Run your application with the following command | |||
{{Command | |||
| command = mpirun -np X -npernode 2 thread_pin_wrapper.sh your_app | |||
}} | |||
{{Warning| text = | |||
Do not use the mpi_paffinity_alone option in this case! | |||
}} | }} | ||
Revision as of 14:15, 26 February 2010
Open MPI is an Message Passing Interface (MPI) library project combining technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI). |
|
Examples
simple example
This example shows the basic steps when using Open MPI.
Load the necessary module
Compile your application using the mpi wrapper compilers mpicc, mpic++ or mpif90:
Now we run our application using 128 processes spread accros 16 nodes in an interactive job (-I option):
specifying the number of processes per node
Open MPI divides resources in something called 'slots'. By specifying ppn:X
to the batchsystem, the number of slots per node is specified.
So for a simple MPI job with 8 process per node (=1 process per core) ppn:8
is best choice, as in above example. Details can be specified on mpirun
command line. PBS setup is adjusted for ppn:8, please do not use other values.
If you want to use less processes per node e.g. because you are restricted by memory requirements, or you have a hybrid parallel application using MPI and OpenMP, MPI would always put the first 8 processes on the first node, second 8 on second and so on. To avoid this, you can use the -npernode
option.
This would start 2 processes per node. Like this, you can use a larger number of nodes with a smaller number of processes, or you can e.g. start threads out of the processes.
process pinning
If you want to pin your processes to a CPU (and enable NUMA memory affinity) use
thread pinning
For pinning of hybrid MPI/OpenMP, use the following wrapper script
#!/bin/bash export KMP_AFFINITY=verbose,scatter # Intel specific environment variable export OMP_NUM_THREADS=4 RANK=${OMPI_COMM_WORLD_RANK:=$PMI_RANK} if [ $(expr $RANK % 2) = 0 ] then export GOMP_CPU_AFFINITY=0-3 numactl --preferred=0 --cpunodebind=0 $@ else export GOMP_CPU_AFFINITY=4-7 numactl --preferred=1 --cpunodebind=1 $@ fi
Run your application with the following command