CRAY XE6 Hardware and Architecture: Difference between revisions

Latest revision as of 11:18, 18 November 2016

Hardware of Installation step 1

Summary Phase 1 Step 1

Cray XE6 supercomputer named Hermit
Peak performance for the whole system: about 1 PFLOP/s (3552*294.4=1045708.8 GFLOP/s)
total compute node RAM: 126 TB (3072*32+480*64=129024 GB)
3552 dual socket G34 compute nodes / 113.664 cores
- AMD Opteron(tm) 6276 (Interlagos) processors (2 per node)
  - an Interlagos CPU is composed of 2 Orchi Dies;
    an Orchi Die consists of 4 Bulldozer modules
    (see also sketch below)
  - 2*4*2= 16 Cores/CPU @ 2.3 GHz (up to 3.2 GHz with TurboCore)
  - 32MB L2+L3 Cache, 16MB L3 Cache
  - 2*2*2 channels of DDR3 PC3-12800 bandwidth to 8 DIMMs (4GB each)
    (2*12.8=25.6 GB/s dual channel Orchi data rate, 51.2 GB/s CPU quad channel data rate, 2*51.2=102.4 GB/s per node)
  - Direct Connect Architecture 2.0 with HyperTransport HT3: 6.4 GT/s*16 Byte/Transfer= 102.4 GB/s
  - supports ISA extensions SSE4.1, SSE4.2, SSSE3, AVX, AES, PCLMULQDQ, FMA4 and XOP
  - Flex FP: Bulldozer modules (2 cores) share a single 2*128= 256bit floating point unit
  - Peak performance per socket: 2.3*4*16= 147.2 GFLOP/s
- Peak performance per node: 2*147.2= 294.4 GFLOP/s
- 3072 standard nodes (86.5%) equipped with 32GB RAM/node (2GB/Core for 98304 Cores);
  480 nodes (13.5%) equipped with 64GB memory (4GB/Core for 15360 Cores)
- stream benchmark shows about 65 GB/s data transfer for a node
96 service nodes (Network nodes, mom nodes, router nodes, DVS nodes, boot, database, syslog)
High Speed Network CRAY Gemini
users HOME filesystem:
- ~60TB (BLUEARC mercury 55)
workspace filesystem:
- Lustre parallel filesystem
- capacity 2.7 PB realized with 16 DDN SFA10K controllers
- IO bandwith ~150GB/s
special user nodes:
- external login servers
- pre-post processing and visualization nodes
  - 4x Intel Xeon X7550 (Nehalem EX OctCore), 2.00GHz (4*8=32 Cores for 32*2=64 HyperThreads)
  - 128GB RAM
  - one node comes with 1TB memory (shared usage)
  - local disks
  - Quadro 6000 rev 2.0 (GF100 Fermi) GPU, 14 SM, 448 Cuda Cores, 6 GB GDDR5 RAM (384bit Interface mit 144 GB/s)
  - direct access to parallel filesytem
infrastructure servers

Architecture

System Management Workstation (SMW)
- system administrator's console for managing a Cray system like monitoring, installing/upgrading software, controls the hardware, starting and stopping the XE6 system.

service nodes are classified in:
- login nodes for users to access the system
- boot nodes which provides the OS for all other nodes, licenses,...
- network nodes which provides e.g. external network connections for the compute nodes
- Cray Data Virtualization Service (DVS): is an I/O forwarding service that can parallelize the I/O transactions of an underlying POSIX-compliant file system.
- sdb node for services like ALPS, torque, moab, cray management services,...
- I/O nodes for e.g. lustre
- MOM (torque) nodes for placing user jobs of the batch system in to execution

compute nodes and pre-post processing nodes
- are only available for user using the batch system and the Application Level Placement Scheduler (ALPS), see running applications.
  - There are compute nodes with 32 GB memory and 64 GB memory available each with fast interconnect (CRAY Gemini)
  - The pre- and postprocessing/visualization infrastructure aims to support users with
    - complex workflows and advanced access methods
    - remote graphics rendering simulation steering in order to minimize data move operations.

Conceptual Architecture

AMD Opteron 6200 Series Processor (Interlagos)

AMD Turbo Core technology

Storage Solution for Hermit installation step 1

Pre-Postprocessing Visualization Server

Software Features

Cray Linux Environment (CLE) 4 operating system
Operating System is based on SUSE Linux Enterprise Server (SLES) 11
Cray Gemini interconnection network
Cluster Compatibility Mode (CCM) functionality enables cluster-based independent software vendor (ISV) applications to run without modification on Cray systems.
Batch System: torque, moab
many development tools available:
- Compiler: Cray, PGI, GNU,
- Debugging: DDT, ATP,...
- Optimizing Code: CrayPat, Cray Apprentice, PAPI
- Libraries: BLAS, LAPACK, FFTW, PETSc, MPT,....

Pictures and video of installation step 1

Video of the installation

Cooling Liquid: R134a (Tetrafluoroethane)

@@ Line 1: / Line 1: @@
-== Hardware of Installation step 0 ==
+== Hardware of Installation step 1 ==
-* 84 [https://fs.hlrs.de/projects/craydoc/docs/books/S-2496-31/html-S-2496-31/appendix.3.IcvG1LiI.html#figure-8zm3e3cz '''compute nodes'''] / 1344 cores
+=== Summary Phase 1 Step 1 ===
-** dual socket G34
+* Cray [http://www.cray.com/Products/Computing/XE/XE6.aspx XE6] supercomputer named [http://en.wikipedia.org/wiki/Osmoderma_eremita Hermit]
-*** Magny-Cours @ 2 GHz (Opteron(tm) Processor 6128)
+* Peak performance for the whole system: about 1 PFLOP/s (3552*294.4=1045708.8 GFLOP/s)
-*** 512KB L2 Cache, 12MB L3 Cache
+* total compute node RAM: 126 TB (3072*32+480*64=129024 GB)
-*** HyperTransport HT3
+* 3552 dual socket G34 [http://docs.cray.com/cgi-bin/craydoc.cgi?mode=View;id=S-2496-4001;right=/books/S-2496-4001/html-S-2496-4001//appendix.3.Yp4mYMuW.html '''compute nodes'''] / 113.664 cores
-*** 8 cores
+** [http://www.amd.com/de/products/server/processors/6000-series-platform/Pages/6000-series-platform.aspx AMD Opteron(tm)] 6276 (Interlagos) processors (2 per node)
-** 32GB memory
+*** an Interlagos CPU is composed of 2 Orchi Dies;<br />an Orchi Die consists of 4 Bulldozer modules<br />(see also sketch [[#InterlagosSchematics|below]])
-* 12 '''service nodes'''
+*** 2*4*2= 16 Cores/CPU @ 2.3 GHz (up to 3.2 GHz with TurboCore)
-** 6-Core AMD Opteron(tm) Processor 23 (D0)
+*** 32MB L2+L3 Cache, 16MB L3 Cache
-*** 2.2GHz
+*** 2*2*2 channels of DDR3 PC3-12800 bandwidth to 8 DIMMs (4GB each)<br />(2*12.8=25.6 GB/s dual channel Orchi data rate, 51.2 GB/s CPU quad channel data rate, 2*51.2=102.4 GB/s per node)
-*** 16GB memory
+*** Direct Connect Architecture 2.0 with HyperTransport HT3: 6.4 GT/s*16 Byte/Transfer= 102.4 GB/s
+*** supports ISA extensions SSE4.1, SSE4.2, SSSE3, AVX, AES, PCLMULQDQ, FMA4 and XOP
+*** Flex FP: Bulldozer modules (2 cores) share a single 2*128= 256bit floating point unit
+*** Peak performance per socket: 2.3*4*16= 147.2 GFLOP/s
+** Peak performance per node: 2*147.2= 294.4 GFLOP/s
+** 3072 standard nodes (86.5%) equipped with 32GB RAM/node (2GB/Core for 98304 Cores);<br /> 480 nodes (13.5%) equipped with 64GB memory (4GB/Core for 15360 Cores)
+** stream benchmark shows about 65 GB/s data transfer for a node
+* 96 '''service nodes''' (Network nodes, mom nodes, router nodes, DVS nodes, boot, database, syslog)
 * High Speed Network '''CRAY Gemini'''
-* LSI FibreChannel RAID System (SATA)
+* users HOME filesystem:
-** Parallel Filesystem
+** ~60TB (BLUEARC mercury 55)
-***  Lustre version 1.8.2
+* workspace filesystem:
-*** capacity 11 TB
+**  Lustre parallel filesystem
-*** 3 OST's, 1 MDS, Network Gemini
+** capacity 2.7 PB realized with 16 DDN SFA10K controllers
-* System Management Workstation (SMW)
+** IO bandwith ~150GB/s
+* special user nodes:
+** external login servers
+** pre-post processing and visualization nodes
+*** 4x [http://ark.intel.com/products/46498/Intel-Xeon-Processor-X7550-18M-Cache-2_00-GHz-6_40-GTs-Intel-QPI Intel Xeon X7550] (Nehalem EX OctCore), 2.00GHz (4*8=32 Cores for 32*2=64 HyperThreads)
+*** 128GB RAM
+*** one node comes with 1TB memory (shared usage)
+*** local disks
+*** [http://www.nvidia.de/object/product-quadro-6000-de.html Quadro 6000] rev 2.0 (GF100 Fermi) GPU, 14 SM, 448 Cuda Cores, 6 GB GDDR5 RAM (384bit Interface mit 144 GB/s)
+*** direct access to parallel filesytem
+* infrastructure servers
 == Architecture ==
@@ Line 24: / Line 41: @@
 * service nodes are classified in:
-** login nodes (xe601.hww.de) for users to [[CRAY_XE6_access| access]] the system
+** login nodes for users to [[CRAY_XE6_access| access]] the system
 ** boot nodes which provides the OS for all other nodes, licenses,...
 ** network nodes which provides e.g. external network connections for the compute nodes
@@ Line 32: / Line 49: @@
 ** MOM (torque) nodes for placing user jobs of the batch system in to execution
-* compute nodes
+* compute nodes and pre-post processing nodes
-** are only available for user using the [[CRAY_XE6_Using_the_Batch_System| batch system]] and the Application Level Placement Scheduler (ALPS), see [https://fs.hlrs.de/projects/craydoc/docs/books/S-2496-31/html-S-2496-31/cnl_apps.html#section-5gdyxf87-oswald running applications].
+** are only available for user using the [[CRAY_XE6_Using_the_Batch_System| batch system]] and the Application Level Placement Scheduler (ALPS), see [http://docs.cray.com/cgi-bin/craydoc.cgi?mode=View;id=S-2496-4001;right=/books/S-2496-4001/html-S-2496-4001//cnl_apps.html running applications].
+*** There are compute nodes with 32 GB memory and 64 GB memory available each with fast interconnect (CRAY Gemini)
+*** The pre- and postprocessing/visualization infrastructure aims to support users with
+****complex workflows and advanced access methods
+****remote graphics rendering simulation steering in order to minimize data move operations.
+=== Conceptual Architecture ===
+[[Image:Conceptual_Architecture_of_Hermit.jpg]]
+=== AMD Opteron 6200 Series Processor (Interlagos) ===
+<div id="InterlagosSchematics">[[Image:Interlagos.jpg]]</div>
+=== AMD Turbo Core technology ===
+[[Image:AMD_Turbo_Core_technology.jpg]]
+=== Storage Solution for Hermit installation step 1 ===
+[[Image:Hermit1_storage_solution.jpg]]
+=== Pre-Postprocessing Visualization Server ===
+[[Image:Hermit_PrePostprocessingVisualization.jpg]]
+=== Software Features ===
+* Cray Linux Environment (CLE) 4 operating system
+* Operating System is based on SUSE Linux Enterprise Server (SLES) 11
+* Cray Gemini interconnection network
+* [http://docs.cray.com/cgi-bin/craydoc.cgi?mode=View;id=S-2496-4001;right=/books/S-2496-4001/html-S-2496-4001//chapter-9b6qil6d-craigf.html Cluster Compatibility Mode (CCM)] functionality enables cluster-based independent software vendor (ISV) applications to run without modification on Cray systems.
+* Batch System: torque, moab
+* many development tools available:
+** [https://fs.hlrs.de/projects/craydoc/docs_merged/books/S-2396-601/html-S-2396-601/chapter-xpb5m1kn-brbethke.html Compiler]: Cray, PGI, GNU,
+** [https://fs.hlrs.de/projects/craydoc/docs_merged/books/S-2396-601/html-S-2396-601/chapter-acu9eycn-brbethke.html Debugging]: DDT, ATP,...
+** [https://fs.hlrs.de/projects/craydoc/docs_merged/books/S-2396-601/html-S-2396-601/chapter-o8nktdbi-brbethke.html Optimizing Code]: CrayPat, Cray Apprentice, PAPI
+** [https://fs.hlrs.de/projects/craydoc/docs_merged/books/S-2396-601/html-S-2396-601/chapter-ck3x6qu8-brbethke.html Libraries]: BLAS, LAPACK, FFTW, PETSc, MPT,....
+== Pictures and video of installation step 1 ==
+[http://www.youtube.com/watch?v=3qirlkHRKR0 Video of the installation]
+[[Image:Hermit1-Folie1.jpg]]
+[[Image:Hermit1-Folie2.jpg]]
+[[Image:Hermit1-Folie3.jpg]]
+[[Image:Hermit1-Folie4.jpg]]
+[[Image:Hermit1-Folie5.jpg]]
+Cooling Liquid: R134a ([https://en.wikipedia.org/wiki/1,1,1,2-Tetrafluoroethane Tetrafluoroethane])
+[[Image:Hermit1-Folie6.jpg]]
+[[Image:Hermit1-Folie7.jpg]]
+[[Image:Hermit1-Folie8.jpg]]
+[[Image:Hermit1-Folie9.jpg]]

CRAY XE6 Hardware and Architecture: Difference between revisions

Latest revision as of 11:18, 18 November 2016

Contents

Hardware of Installation step 1

Summary Phase 1 Step 1

Architecture

Conceptual Architecture

AMD Opteron 6200 Series Processor (Interlagos)

AMD Turbo Core technology

Storage Solution for Hermit installation step 1

Pre-Postprocessing Visualization Server

Software Features

Pictures and video of installation step 1

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools