- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

NEC Cluster Hardware and Architecture (vulcan): Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
(corrected SB node numbers)
No edit summary
 
(18 intermediate revisions by 4 users not shown)
Line 2: Line 2:
=== Hardware ===
=== Hardware ===


 
The list of currently available hardware can be found [https://kb.hlrs.de/platforms/index.php/Batch_System_PBSPro_(vulcan)#Node_types here].
*''' Pre- & Postprocessing node''' (''smp'' node)
** 8x Intel Xeon [http://ark.intel.com/products/46497/Intel-Xeon-Processor-X7542-(18M-Cache-2_66-GHz-5_86-GTs-Intel-QPI) X7542] 6-core CPUs with 2.67GHz (8*6=48 Cores)
** 1TB RAM
** shared access
 
*'''Visualisation node''' (''vis'')
** ??? nodes each with 8 cores Intel [http://ark.intel.com/de/products/39719/Intel-Xeon-Processor-W3540-8M-Cache-2_93-GHz-4_80-GTs-Intel-QPI W3540] and 24GB memory
*** Nvidia Quadro FX5800
 
* '''SandyBridge compute nodes'''
** 80 nodes Dual Intel [[Sb|'Sandy Bridge']] [http://ark.intel.com/de/products/64595/Intel-Xeon-Processor-E5-2670-20M-Cache-2_60-GHz-8_00-GTs-Intel-QPI E5-2670]
*** 2.6 Ghz, 8 Cores per processor, 16 Threads
*** 4 memory channels per processor, DDR3 1600Mhz memory
*** 50 nodes with 32GB RAM (''sb''/''mem32gb'')
*** 30 nodes with 64GB RAM (''mem64gb'')
*** QDR Mellanox ConnectX-3 IB HCAs (40gbit)
 
* '''Haswell 20 Cores compute nodes'''
** 80 nodes Dual Intel [[hsw|'Haswell']] [http://ark.intel.com/de/products/81706/Intel-Xeon-Processor-E5-2660-v3-25M-Cache-2_60-GHz E5-2660v3]
*** 2.6 Ghz, 10 Cores per processor, 20 Threads
*** 4 memory channels per processor, DDR4 2133Mhz memory
*** 76 nodes with 128GB RAM (''hsw128gb10c'')
*** 4 nodes with 256GB RAM (''hsw256gb10c'')
*** QDR Mellanox ConnectX-3 IB HCAs (40gbit)
 
* '''Haswell 24 Cores compute nodes'''
** 168 nodes Dual Intel [[hsw|'Haswell']] [http://ark.intel.com/de/products/81908/Intel-Xeon-Processor-E5-2680-v3-30M-Cache-2_50-GHz E5-2680v3]
*** 2.5 Ghz, 12 Cores per processor, 24 Threads
*** 4 memory channels per processor, DDR4 2133Mhz memory
*** 152 nodes with 128GB RAM (''hsw128gb12c'')
*** 16 nodes with 256GB RAM (''hsw256gb12c'')
*** QDR Mellanox ConnectX-3 IB HCAs (40gbit), 144 of the 128GB nodes have fdr IB, (''fdr'')
 
*'''Skylake 40 Cores compute nodes'''
** 100 nodes Dual Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz [https://www.intel.com/content/www/us/en/products/processors/xeon/scalable/gold-processors/gold-6138.html]
*** 2.0GHz, 20 Cores per processor, 40 Threads
*** 6 memory channels, DDR4 2666 MHz memory
*** 192 GB RAM ("skl192gb20c")
*** EDR Mellanox ConnectX-5 IB HCAs (100gbit)
 
* '''10 Visualisation/GPU graphic nodes with'''
** Nvidia Tesla P100 12GB
** 2 sockets ech 8 cores (Intel E5-2667v4 @ 3.2GHz)
** 256GB memory
** 3.7TB /localscratch, 400GB SSD /tmp
 
 
* '''network''': [http://de.wikipedia.org/wiki/Infiniband InfiniBand] Double Data Rate
** switches for interconnect: [http://www.voltaire.com/Products/Grid_Backbone_Switches Voltaire Grid Director] [http://www.voltaire.com/Products/InfiniBand/Grid_Director_Switches/Voltaire_Grid_Director_4036 4036] with 36 QDR (40Gbps) ports (6 backbone switches)


=== Architecture ===
=== Architecture ===


The NEC Cluster platform (vulcan) consists of several '''frontend nodes''' for interactive access (for access details see [[NEC_Cluster_access_(vulcan)| Access]]) and several compute nodes of different types for execution of parallel programs. Some parts of the compute nodes comes from the old NEC Cluster laki.  
The NEC Cluster platform (vulcan) consists of several '''frontend nodes''' for interactive access (for access details see [[NEC_Cluster_access_(vulcan)| Access]]) and several compute nodes of different types for execution of parallel programs. Some parts of the compute nodes comes from the old NEC Cluster laki.  


'''Compute node types installed:'''  
'''Compute node types installed:'''  
* Sandybridge, Haswell, Skylake
* Intel Xeon Broadwell, Skylake, CascadeLake
* different Memory nodes (32GB, 64GB, 128GB, 256GB, 384GB)
* AMD Epyc Rome, Genoa
* Pre-Postprocessing node with very large memory (1TB)
* different Memory sizes (256GB, 384GB, 512GB, 768GB)
* Visualisation/GPU nodes with Nvidia Quadro FX5800 or Nvidia Tesla P100
* Pre-Postprocessing node with very large memory (1.5TB, 3TB)
 
* Visualisation/GPU nodes with AMD Radeon Pro WX8200, Nvidia Quadro RTX4000 or Nvidia A30


* Vector nodes with NEC Aurora TSUBASA CPUs
    
    
'''Features'''
'''Features'''
* Operating System: Centos 7
* Operating System: Rocky Linux 8
* Batchsystem: PBSPro
* Batchsystem: PBSPro
* node-node interconnect: Infiniband + GigE
* node-node interconnect: Infiniband + 10G Ethernet
* Global Disk 500 TB (lustre) for vulcan + 500TB (lustre) for vulcan2
* Global Disk 2.2 PB (lustre) for vulcan + 500TB (lustre) for vulcan2
* Many Software Packages for Development
* Many Software Packages for Development
=== History ===
{{Warning
| text = Hardware Upgrade took place on 2024-05-24<br>
Some of the compute nodes and network infrastructure of vulcan has been replaced by up to date hardware.
}}
{| class="wikitable" border="1" cellpadding="2"
|+'''Replacement Overview:'''
|-
|'''node_type'''||'''historical node number'''||'''current node number'''
|-
|''aurora''|| 8 || 8
|-
|''clx-21''|| 8 || 8
|-
|''clx-25''        || 96 ||        96
|-
|<font color=red>''clx-ai''</font>        ||  4 ||          <font color=red>0</font>
|-
|<font color=red>''hsw128gb20c''</font>  || 84 ||          <font color=red>0</font>
|-
|<font color=red>''hsw128gb24c''</font>  || 152 ||          <font color=red>0</font>
|-
|<font color=red>''hsw256gb20c''</font>  || 4 ||          <font color=red>0</font>
|-
|<font color=red>''hsw256gb24c''</font>  || 16 ||          <font color=red>0</font>
|-
|<font color=red>''k20xm''</font>        ||  3 ||          <font color=red>0</font>
|-
|''p100''          ||  3 ||          3
|-
|''skl''          || 68 ||        72
|-
|''smp''          ||  2 ||          1
|-
|''visamd''        ||  6 ||          6
|-
|''visnv''        ||  2 ||          2
|-
|<font color=red>''visp100''</font>      || 10 ||          <font color=red>0</font>
|-
|''rome256gb32c''  ||  3 ||          3 <sup>(1)(2)</sup>
|-
|''rome512gb96c-ai'' || 10 ||        10 <sup>(1)(3)</sup>
|-
|<font color=green>''genoa''</font>          || 0 ||        <font color=green>60</font> <sup>(4)(5)</sup>
|-
|<font color=green>''genoa-a30''</font>      || 0 ||        <font color=green>24</font> <sup>(4)(6)</sup>
|-
|<font color=green>''genoa-smp''</font>      || 0 ||          <font color=green>2</font> <sup>(4)(7)</sup>
|-
|}
<sup>
(1) academic usage only<br>
(2) 2x AMD Epic 7302 Rome, 3.0GHz base, 32 cores total, 256GB DDR4, 3.5TB NVMe<br>
(3) 2x AMD Epyc 7642 Rome, 2.3GHz base, 96 cores total, 512GB DDR4, 1.8TB NVMe, 8x AMD Instinct Mi50 with 32GB<br>
(4) new nodes, node_type not yet fixed<br>
(5) 2x AMD Epyc 9334 Genoa, 2.7GHz base, 64 cores total, 768GB DDR5<br>
(6) 2x AMD Epyc 9124 Genoa, 3.0GHz base, 32 cores total, 768GB DDR5, 3.8TB NVMe, 1x Nvidia A30 with 24GB HBM2e<br>
(7) 2x AMD Epyc 9334 Genoa, 2.7GHz base, 64 cores total, 3072GB DDR5<br>
</sup>

Latest revision as of 08:41, 25 October 2024

Hardware

The list of currently available hardware can be found here.

Architecture

The NEC Cluster platform (vulcan) consists of several frontend nodes for interactive access (for access details see Access) and several compute nodes of different types for execution of parallel programs. Some parts of the compute nodes comes from the old NEC Cluster laki.

Compute node types installed:

  • Intel Xeon Broadwell, Skylake, CascadeLake
  • AMD Epyc Rome, Genoa
  • different Memory sizes (256GB, 384GB, 512GB, 768GB)
  • Pre-Postprocessing node with very large memory (1.5TB, 3TB)
  • Visualisation/GPU nodes with AMD Radeon Pro WX8200, Nvidia Quadro RTX4000 or Nvidia A30
  • Vector nodes with NEC Aurora TSUBASA CPUs

Features

  • Operating System: Rocky Linux 8
  • Batchsystem: PBSPro
  • node-node interconnect: Infiniband + 10G Ethernet
  • Global Disk 2.2 PB (lustre) for vulcan + 500TB (lustre) for vulcan2
  • Many Software Packages for Development


History

Warning: Hardware Upgrade took place on 2024-05-24
Some of the compute nodes and network infrastructure of vulcan has been replaced by up to date hardware.


Replacement Overview:
node_type historical node number current node number
aurora 8 8
clx-21 8 8
clx-25 96 96
clx-ai 4 0
hsw128gb20c 84 0
hsw128gb24c 152 0
hsw256gb20c 4 0
hsw256gb24c 16 0
k20xm 3 0
p100 3 3
skl 68 72
smp 2 1
visamd 6 6
visnv 2 2
visp100 10 0
rome256gb32c 3 3 (1)(2)
rome512gb96c-ai 10 10 (1)(3)
genoa 0 60 (4)(5)
genoa-a30 0 24 (4)(6)
genoa-smp 0 2 (4)(7)

(1) academic usage only
(2) 2x AMD Epic 7302 Rome, 3.0GHz base, 32 cores total, 256GB DDR4, 3.5TB NVMe
(3) 2x AMD Epyc 7642 Rome, 2.3GHz base, 96 cores total, 512GB DDR4, 1.8TB NVMe, 8x AMD Instinct Mi50 with 32GB
(4) new nodes, node_type not yet fixed
(5) 2x AMD Epyc 9334 Genoa, 2.7GHz base, 64 cores total, 768GB DDR5
(6) 2x AMD Epyc 9124 Genoa, 3.0GHz base, 32 cores total, 768GB DDR5, 3.8TB NVMe, 1x Nvidia A30 with 24GB HBM2e
(7) 2x AMD Epyc 9334 Genoa, 2.7GHz base, 64 cores total, 3072GB DDR5