- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
GPI-2: Difference between revisions
No edit summary |
|||
Line 64: | Line 64: | ||
<pre> | <pre> | ||
# 2 nodes, 24 cores (ht is enabled), interactive | # 2 nodes, 24 cores (ht is enabled), interactive | ||
qsub -I -l nodes=2:ppn=24 | qsub -I -l nodes=2:ppn=24 -l walltime=00:05:00 | ||
# 4 procs on 2 nodes,socket affinity mask, ht on | # 4 procs on 2 nodes,socket affinity mask, ht on |
Revision as of 13:55, 22 November 2018
GPI-2 (Global address space Programming Interface) is a threadsafe PGAS API for Infiniband,ROCE,ETHERNET,GEMINI and ARIES networks.
GPI-2 aims at high performance, delivering wire-speed from the interconnect. It relies on one-sided and asynchronous communication that allow a perfect overlap between computation and communication. All GPI2 methods provide a timeout functionality for fault tolerant operation. With that in place, GPI-2 offers mechanisms that allow applications to react to failures and continue its execution. |
|
Using GPI-2 on Cray XC30
Load the necessary module. For example:
module swap PrgEnv-cray PrgEnv-gnu module load gpi2/1.3.2
Example
#include <stdio.h> #include <stdlib.h> #include <GASPI.h> int main(int argc,char *argv[]){ gaspi_rank_t rank,tnc; gaspi_float vers,vers_gni; char mtype[16]; if( gaspi_proc_init(GASPI_BLOCK) != GASPI_SUCCESS ){ printf("gaspi_init failed !\n"); exit(-1); } gaspi_version(&vers); gaspi_proc_rank(&rank); gaspi_proc_num(&tnc); printf("rank: %d tnc: %d (gpi2: %.2f ugni: %.2f)\n",rank,tnc,vers,vers_gni); if( gaspi_barrier(GASPI_GROUP_ALL,GASPI_BLOCK) != GASPI_SUCCESS ){ printf("gaspi_barrier failed !\n"); exit(-1); } gaspi_proc_term(GASPI_BLOCK); return 0; }
Compilation example
cc -O2 hello_gpi2.c -D_GNU_SOURCE -lpmi -lugni -lrca -lGPI2 -o hello_gpi2.bin
Example to run the program
GPI-2 Applications should be started with one process per NUMA Socket. Use threads to exploit SMP parallelism on each NUMA Socket (e.g. mctp3 for best performance). Ex.:
# 2 nodes, 24 cores (ht is enabled), interactive qsub -I -l nodes=2:ppn=24 -l walltime=00:05:00 # 4 procs on 2 nodes,socket affinity mask, ht on aprun -q -n 4 -N 2 -cc numa_node -d 12 -ss -j 2 ./hello_gpi2.bin