- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -

Difference between revisions of "CRAY XE6 Hardware and Architecture"

From HLRS Platforms
Jump to navigationJump to search
Line 1: Line 1:
== Hardware of Installation step 1 ==
== Hardware of Installation step 1 ==
=== Summary Phase 1 Step 1 ===
=== Summary Phase 1 Step 1 ===
* Cray [http://www.cray.com/Products/Computing/XE/XE6.aspx XE6] supercomputer
* Cray [http://www.cray.com/Products/Computing/XE/XE6.aspx XE6] supercomputer named [http://en.wikipedia.org/wiki/Osmoderma_eremita Hermit]
* Peak performance for the whole system: about 1 PFLOP/s (3552*294.4=1045708.8 GFLOP/s)
* Peak performance for the whole system: about 1 PFLOP/s (3552*294.4=1045708.8 GFLOP/s)
* total compute node RAM: 126 TB (3072*32+480*64=129024 GB)
* total compute node RAM: 126 TB (3072*32+480*64=129024 GB)

Revision as of 11:50, 21 October 2014

Hardware of Installation step 1

Summary Phase 1 Step 1

  • Cray XE6 supercomputer named Hermit
  • Peak performance for the whole system: about 1 PFLOP/s (3552*294.4=1045708.8 GFLOP/s)
  • total compute node RAM: 126 TB (3072*32+480*64=129024 GB)
  • 3552 dual socket G34 compute nodes / 113.664 cores
    • AMD Opteron(tm) 6276 (Interlagos) processors (2 per node)
      • an Interlagos CPU is composed of 2 Orchi Dies;
        an Orchi Die consists of 4 Bulldozer modules
        (see also sketch below)
      • 2*4*2= 16 Cores/CPU @ 2.3 GHz (up to 3.2 GHz with TurboCore)
      • 32MB L2+L3 Cache, 16MB L3 Cache
      • 2*2*2 channels of DDR3 PC3-12800 bandwidth to 8 DIMMs (4GB each)
        (2*12.8=25.6 GB/s dual channel Orchi data rate, 51.2 GB/s CPU quad channel data rate, 2*51.2=102.4 GB/s per node)
      • Direct Connect Architecture 2.0 with HyperTransport HT3: 6.4 GT/s*16 Byte/Transfer= 102.4 GB/s
      • supports ISA extensions SSE4.1, SSE4.2, SSSE3, AVX, AES, PCLMULQDQ, FMA4 and XOP
      • Flex FP: Bulldozer modules (2 cores) share a single 2*128= 256bit floating point unit
      • Peak performance per socket: 2.3*4*16= 147.2 GFLOP/s
    • Peak performance per node: 2*147.2= 294.4 GFLOP/s
    • 3072 standard nodes (86.5%) equipped with 32GB RAM/node (2GB/Core for 98304 Cores);
      480 nodes (13.5%) equipped with 64GB memory (4GB/Core for 15360 Cores)
    • stream benchmark shows about 65 GB/s data transfer for a node
  • 96 service nodes (Network nodes, mom nodes, router nodes, DVS nodes, boot, database, syslog)
  • High Speed Network CRAY Gemini
  • users HOME filesystem:
    • ~60TB (BLUEARC mercury 55)
  • workspace filesystem:
    • Lustre parallel filesystem
    • capacity 2.7 PB realized with 16 DDN SFA10K controllers
    • IO bandwith ~150GB/s
  • special user nodes:
    • external login servers
    • pre-post processing and visualization nodes
      • 4x Intel Xeon X7550 (Nehalem EX OctCore), 2.00GHz (4*8=32 Cores for 32*2=64 HyperThreads)
      • 128GB RAM
      • one node comes with 1TB memory (shared usage)
      • local disks
      • Quadro 6000 rev 2.0 (GF100 Fermi) GPU, 14 SM, 448 Cuda Cores, 6 GB GDDR5 RAM (384bit Interface mit 144 GB/s)
      • direct access to parallel filesytem
  • infrastructure servers


  • System Management Workstation (SMW)
    • system administrator's console for managing a Cray system like monitoring, installing/upgrading software, controls the hardware, starting and stopping the XE6 system.
  • service nodes are classified in:
    • login nodes for users to access the system
    • boot nodes which provides the OS for all other nodes, licenses,...
    • network nodes which provides e.g. external network connections for the compute nodes
    • Cray Data Virtualization Service (DVS): is an I/O forwarding service that can parallelize the I/O transactions of an underlying POSIX-compliant file system.
    • sdb node for services like ALPS, torque, moab, cray management services,...
    • I/O nodes for e.g. lustre
    • MOM (torque) nodes for placing user jobs of the batch system in to execution
  • compute nodes and pre-post processing nodes
    • are only available for user using the batch system and the Application Level Placement Scheduler (ALPS), see running applications.
      • There are compute nodes with 32 GB memory and 64 GB memory available each with fast interconnect (CRAY Gemini)
      • The pre- and postprocessing/visualization infrastructure aims to support users with
        • complex workflows and advanced access methods
        • remote graphics rendering simulation steering in order to minimize data move operations.

Conceptual Architecture

Conceptual Architecture of Hermit.jpg

AMD Opteron 6200 Series Processor (Interlagos)


AMD Turbo Core technology

AMD Turbo Core technology.jpg

Storage Solution for Hermit installation step 1

Hermit1 storage solution.jpg

Pre-Postprocessing Visualization Server

Hermit PrePostprocessingVisualization.jpg

Software Features

  • Cray Linux Environment (CLE) 4 operating system
  • Operating System is based on SUSE Linux Enterprise Server (SLES) 11
  • Cray Gemini interconnection network
  • Cluster Compatibility Mode (CCM) functionality enables cluster-based independent software vendor (ISV) applications to run without modification on Cray systems.
  • Batch System: torque, moab
  • many development tools available:

Pictures and video of installation step 1

Video of the installation

Hermit1-Folie1.jpg Hermit1-Folie2.jpg Hermit1-Folie3.jpg Hermit1-Folie4.jpg Hermit1-Folie5.jpg Hermit1-Folie6.jpg Hermit1-Folie7.jpg Hermit1-Folie8.jpg Hermit1-Folie9.jpg