- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

CRAY XE6 Programming Hints: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
Line 1: Line 1:
== General Notice ==
== General Notice ==
 
<span style="color:red">Please first check</span> the general [[Programming How-To's, Tips & Trips]] page as well for further further information (compiler options, MPI-IO optimization), which show benefits on Cray XE6 as well.


On the login servers of the cray xe6 system, users get by default the setup for the highly optimizing programming environment provided by Cray:
On the login servers of the cray xe6 system, users get by default the setup for the highly optimizing programming environment provided by Cray:
Line 6: Line 6:
*  Efficient communications library  
*  Efficient communications library  
*  …
*  …
All of this software components have been tested and proved to work on the system. If you encounter problems with this environment, please file a bug report using the [http://www.hlrs.de/organization/sos/hpcn/services/trouble-ticket-submission-form/ HLRS trouble ticket system] Different compiler/environment (e.g. openMPI) should be handled as academic and could be used as a workaround. Please be aware you will get the most efficient support (and performance) by using the software provided by the vendor.
All of this software components have been tested and proved to work on the system. If you encounter problems with this environment, please file a bug report using the [http://www.hlrs.de/organization/sos/hpcn/services/trouble-ticket-submission-form/ HLRS trouble ticket system] Different compiler/environment (e.g. Open MPI) should be handled as academic and could be used as a workaround. Please be aware you will get the most efficient support (and performance) by using the software provided by the vendor.


Please be aware the Interlagos processor architecture is brand new. This will  cause frequent software updates. To get always the best performance users should recompile their programs
Please be aware the AMD Interlagos processor architecture is brand new. This will  cause frequent software updates. To get always the best performance users should recompile their programs


= Programming Hints =
= Programming Hints =

Revision as of 09:17, 12 January 2012

General Notice

Please first check the general Programming How-To's, Tips & Trips page as well for further further information (compiler options, MPI-IO optimization), which show benefits on Cray XE6 as well.

On the login servers of the cray xe6 system, users get by default the setup for the highly optimizing programming environment provided by Cray:

  • Compilers
  • Efficient communications library

All of this software components have been tested and proved to work on the system. If you encounter problems with this environment, please file a bug report using the HLRS trouble ticket system Different compiler/environment (e.g. Open MPI) should be handled as academic and could be used as a workaround. Please be aware you will get the most efficient support (and performance) by using the software provided by the vendor.

Please be aware the AMD Interlagos processor architecture is brand new. This will cause frequent software updates. To get always the best performance users should recompile their programs

Programming Hints

This page provides hints and howtos for common programming tasks and problems, that are specific to the Cray XE6. For general programming information, we refer to our Wiki pages such as usage of Open MPI, or Best practices for MPI-IO.

Cray compiler

Cray compiler and inline assembler

The Cray compiler as of now (v. 7.4.4) does not support inline assembly with the __asm__ extension. Common code that fails to compile is the Intel/AMD Time Stamp Counter (rdtsc) as in the following code:

 static inline unsigned long long getrdtsc(void)
 {
   unsigned long long x;
 #if defined (__i386__)
   __asm__ __volatile__ ("rdtsc" : "=A" (x));
 #elif defined (__x86_64__)
   unsigned int tickl, tickh;
   __asm__ __volatile__ ("rdtsc" : "=a" (tickl), "=d" (tickh));
   x = ((unsigned long long)tickh << 32) | tickl;
 #else
 #warning "Architecture not yet supported in ASM"
 #endif
   return x;
 }

You may easily handle this by compiling into assembler only the relevant functions / parts in gcc, and then compiling with those functions:

  • Copy the function into its own file.
  • Delete the static keyword of the function argument list.
  • Compile using gcc into an assembler output file called rdtsc.S:
 gcc -O2 -S rdtsc.c
  • Eliminate gcc's markers around the __asm__ section named # APP and # NOAPP and eliminate all the sugar
        .text
        .p2align 4,,15
  .globl getrdtsc
        .type   getrdtsc, @function
  getrdtsc:
  .LFB2:
        rdtsc
        movl    %eax, %ecx
        movq    %rdx, %rax
        salq    $32, %rax
        mov     %ecx, %edx
        orq     %rdx, %rax
        ret
  .LFE2:
        .size   getrdtsc, .-getrdtsc
        .section        .eh_frame,"a",@progbits

  • Finally compile the files calling the assembly function:
   craycc main.c rdtsc.S



ALPS usage

Usually, the programmer does not need to get access to the ALPS launcher on Cray (MPI does everything necessary to set up the process. In order to write parallel applications, accessing uGNI, or in order to get information on the allocation (e.g to detect the number of threads that have been allocated in this qsub), one may access ALPS information, easily.

The main header files are libalps.h for ALPS access, libalpslli.h for the low-level interface and finally libalpsutil.h to get access to placement infos on the compute nodes (alps_get_placement_info).

The usage is quite easy: To request information (which requires a response), such as getting the application ID (apid), send a alps_app_lli_put_request, expect a response with alps_app_lli_get_response. If the status is OK (here only return value tested for), receive the actual message with alps_app_lli_get_response_bytes.

The following examples gets the application info (app_info which contains information like the GNI pTag and cookie):

 ret = alps_app_lli_put_request(ALPS_APP_LLI_ALPS_REQ_APID, NULL, 0);
 if (ALPS_APP_LLI_ALPS_STAT_OK != ret)
     ERROR (ret, "alps_app_lli_put_simple_request()");
 
 ret = alps_app_lli_get_response (&alps_status, &alps_count);
 if (ALPS_APP_LLI_ALPS_STAT_OK != ret)
     ERROR (ret, "alps_app_lli_get_response()");
 
 ret = alps_app_lli_get_response_bytes (&apid, sizeof(apid));
 if (ALPS_APP_LLI_ALPS_STAT_OK != ret)
     ERROR (ret, "alps_app_lli_get_response_bytes()");
 
 ret = alps_get_appinfo(apid, &app_info, &app_cmddetail, &app_places);
 if (-1 == ret)
     ERROR (ret, "alps_get_appinfo()");