- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
NEC Cluster Using MPI
OpenMPI example
see Open MPI
Intel MPI example
simple example
Load the necessary modules
Run your application with
more complex example
As Nehalem system is a two socket system with local attached ccNUMA memory, memory and process placement can be crucial.
Here is an example that shows a 16 node Job, using 1 process per socket and 4 threads per socket and optimum NUMA placement of processes and memory.
Prerequisite: Use intel MPI and best intel compiler To setup environment for this, use this .modulerc file in your home:
#%Module1.0# set version 1.0 module load compiler/intel/11.0 module load mpi/impi/intel-11.0.074-impi-3.2.0.011
And compile your application using mpicc/mpicxx/mpif90 (GNU compiler) or mpiicc/mpiicpc/mpiifort (Intel compiler).
First step: Batch submit to get the nodes
Second step: make a hostlist
Third step: make a process ring to be used by MPI later
Fourth step: start MPI application
With wrapper.sh looking like this
#!/bin/bash export KMP_AFFINITY=verbose,scatter export OMP_NUM_THREADS=4 RANK=${OMPI_COMM_WORLD_RANK:=$PMI_RANK} if [ $(expr $RANK % 2) = 0 ] then export GOMP_CPU_AFFINITY=0-3 numactl --preferred=0 --cpunodebind=0 $@ else export GOMP_CPU_AFFINITY=4-7 numactl --preferred=1 --cpunodebind=1 $@ fi
Result is an application running on 16 nodes, using 32 processes spawning
128 threads. One set of 4 threads is pinned to the one socket, the other set of 4 threads to the other socket.
MVAPICH2 example
see MVAPICH2