- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
CRAY XE6 Hardware and Architecture: Difference between revisions
From HLRS Platforms
Jump to navigationJump to search
mNo edit summary |
|||
Line 15: | Line 15: | ||
*** Peak performance per socket: 2.3*4*16= 147.2 GFLOP/s | *** Peak performance per socket: 2.3*4*16= 147.2 GFLOP/s | ||
** Peak performance per node: 2*147.2= 294.4 GFLOP/s | ** Peak performance per node: 2*147.2= 294.4 GFLOP/s | ||
** standard | ** 3072 standard nodes (86.5%) equipped with 32GB RAM/node (2GB/Core for 98304 Cores);<br /> 480 nodes (13.5%) equipped with 64GB memory (4GB/Core for 15360 Cores) | ||
** stream benchmark shows about 65 GB/s data transfer for a node | ** stream benchmark shows about 65 GB/s data transfer for a node | ||
* 96 '''service nodes''' (Network nodes, mom nodes, router nodes, DVS nodes, boot, database, syslog) | * 96 '''service nodes''' (Network nodes, mom nodes, router nodes, DVS nodes, boot, database, syslog) |
Revision as of 13:47, 5 December 2013
Hardware of Installation step 1
Summary Phase 1 Step 1
- Cray XE6 supercomputer
- Peak performance for the whole system: about 1 PFLOP/s (3552*294.4=1045708.8 GFLOP/s)
- total compute node RAM: 126 TB (3072*32+480*64=129024 GB)
- 3552 dual socket G34 compute nodes / 113.664 cores
- AMD Opteron(tm) 6276 (Interlagos) processors (2 per node)
- an Interlagos CPU is composed of 2 Orchi Dies;
an Orchi Die consists of 4 Bulldozer modules
(see also sketch below) - 2*4*2= 16 Cores/CPU @ 2.3 GHz (up to 3.2 GHz with TurboCore)
- 32MB L2+L3 Cache, 16MB L3 Cache
- 2*2*2 channels of DDR3 PC3-12800 bandwidth to 8 DIMMs (4GB each)
(2*12.8=25.6 GB/s dual channel Orchi data rate, 51.2 GB/s CPU quad channel data rate, 2*51.2=102.4 GB/s per node) - Direct Connect Architecture 2.0 with HyperTransport HT3: 6.4 GT/s*16 Byte/Transfer= 102.4 GB/s
- supports ISA extensions SSE4.1, SSE4.2, SSSE3, AVX, AES, PCLMULQDQ, FMA4 and XOP
- Flex FP: Bulldozer modules (2 cores) share a single 2*128= 256bit floating point unit
- Peak performance per socket: 2.3*4*16= 147.2 GFLOP/s
- an Interlagos CPU is composed of 2 Orchi Dies;
- Peak performance per node: 2*147.2= 294.4 GFLOP/s
- 3072 standard nodes (86.5%) equipped with 32GB RAM/node (2GB/Core for 98304 Cores);
480 nodes (13.5%) equipped with 64GB memory (4GB/Core for 15360 Cores) - stream benchmark shows about 65 GB/s data transfer for a node
- AMD Opteron(tm) 6276 (Interlagos) processors (2 per node)
- 96 service nodes (Network nodes, mom nodes, router nodes, DVS nodes, boot, database, syslog)
- High Speed Network CRAY Gemini
- users HOME filesystem:
- ~60TB (BLUEARC mercury 55)
- workspace filesystem:
- Lustre parallel filesystem
- capacity 2.7 PB realized with 16 DDN SFA10K controllers
- IO bandwith ~150GB/s
- special user nodes:
- external login servers
- pre-post processing and visualization nodes
- 4x Intel Xeon X7550 (Nehalem EX OctCore), 2.00GHz (4*8=32 Cores for 32*2=64 HyperThreads)
- 128GB RAM
- one node comes with 1TB memory (shared usage)
- local disks
- Quadro 6000 rev 2.0 (GF100 Fermi) GPU, 14 SM, 448 Cuda Cores, 6 GB GDDR5 RAM (384bit Interface mit 144 GB/s)
- direct access to parallel filesytem
- infrastructure servers
Architecture
- System Management Workstation (SMW)
- system administrator's console for managing a Cray system like monitoring, installing/upgrading software, controls the hardware, starting and stopping the XE6 system.
- service nodes are classified in:
- login nodes for users to access the system
- boot nodes which provides the OS for all other nodes, licenses,...
- network nodes which provides e.g. external network connections for the compute nodes
- Cray Data Virtualization Service (DVS): is an I/O forwarding service that can parallelize the I/O transactions of an underlying POSIX-compliant file system.
- sdb node for services like ALPS, torque, moab, cray management services,...
- I/O nodes for e.g. lustre
- MOM (torque) nodes for placing user jobs of the batch system in to execution
- compute nodes and pre-post processing nodes
- are only available for user using the batch system and the Application Level Placement Scheduler (ALPS), see running applications.
- There are compute nodes with 32 GB memory and 64 GB memory available each with fast interconnect (CRAY Gemini)
- The pre- and postprocessing/visualization infrastructure aims to support users with
- complex workflows and advanced access methods
- remote graphics rendering simulation steering in order to minimize data move operations.
- are only available for user using the batch system and the Application Level Placement Scheduler (ALPS), see running applications.
Conceptual Architecture
AMD Opteron 6200 Series Processor (Interlagos)
AMD Turbo Core technology
Storage Solution for Hermit installation step 1
Pre-Postprocessing Visualization Server
Software Features
- Cray Linux Environment (CLE) 4 operating system
- Operating System is based on SUSE Linux Enterprise Server (SLES) 11
- Cray Gemini interconnection network
- Cluster Compatibility Mode (CCM) functionality enables cluster-based independent software vendor (ISV) applications to run without modification on Cray systems.
- Batch System: torque, moab
- many development tools available:
- Compiler: Cray, PGI, GNU,
- Debugging: DDT, ATP,...
- Optimizing Code: CrayPat, Cray Apprentice, PAPI
- Libraries: BLAS, LAPACK, FFTW, PETSc, MPT,....