- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

NEC Aurora HW: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
(Created page with "=== Hardware === * 64 NEC Aurora TSUBASA nodes with a single vector processor ** 8 cores per processor/VE, 2.1 TFlops peak ** 1.4 GHz frequency ** 32 vector pipes per core,...")
 
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
=== Hardware ===
=== Hardware ===


* 64 NEC Aurora TSUBASA nodes with a single vector processor  
* 64 NEC Aurora TSUBASA cards with a single vector processor (8 A300-8 with 8 VE type 10B each)
** 8 cores per processor/VE, 2.1 TFlops peak
** 8 cores per processor/VE, 2.1 TFlops peak
** 1.4 GHz frequency
** 1.4 GHz frequency
Line 12: Line 12:
** 48 GB HBM memory per node
** 48 GB HBM memory per node
** 6 HBM channels, for 1.2TB/s memory bandwidth
** 6 HBM channels, for 1.2TB/s memory bandwidth
** 2xEDR 100GBit/s Infiniband links per VH
=== Software Architecture ===
The VE card does not run an OS, but only runs the user application the moment it is started,
before and after it is stopped and does nothing.
All system calls are forwarded to the VH, filesystem accesses are therefor transparent.
The execution model allows processes and threads, but it does not favor thread oversubscription,
it is not good practice and will yield bad performance to have more than one process or thread on a core.
It is a timesharing model, so it is possible to have several processes (even of different users) on a core,
but it makes not really sense, timeslices are long and context switches are expensive.
In contrast to previous SX architecture, the data format is little endian, and compiler mimic gcc memory as
good as possible, so binary files should be exchangable with files written with gcc compiled programs in x86-64.
Caches including LLC are coherent, LLC is writeback, all memory transfer is passing through LLC, but LLC knows two priorities, look for keyworkd ''retain'' in compiler manuals.

Latest revision as of 16:28, 8 December 2022

Hardware

  • 64 NEC Aurora TSUBASA cards with a single vector processor (8 A300-8 with 8 VE type 10B each)
    • 8 cores per processor/VE, 2.1 TFlops peak
    • 1.4 GHz frequency
    • 32 vector pipes per core, each doing 3 FMA per clock, resulting in 192 flops/cycle
    • 64 vector registers
    • little endian data formats
    • Out of Order execution
    • 256KB L2 cache per core
    • 16MB shared LLC
    • 48 GB HBM memory per node
    • 6 HBM channels, for 1.2TB/s memory bandwidth
    • 2xEDR 100GBit/s Infiniband links per VH

Software Architecture

The VE card does not run an OS, but only runs the user application the moment it is started, before and after it is stopped and does nothing.

All system calls are forwarded to the VH, filesystem accesses are therefor transparent.

The execution model allows processes and threads, but it does not favor thread oversubscription, it is not good practice and will yield bad performance to have more than one process or thread on a core.

It is a timesharing model, so it is possible to have several processes (even of different users) on a core, but it makes not really sense, timeslices are long and context switches are expensive.

In contrast to previous SX architecture, the data format is little endian, and compiler mimic gcc memory as good as possible, so binary files should be exchangable with files written with gcc compiled programs in x86-64.

Caches including LLC are coherent, LLC is writeback, all memory transfer is passing through LLC, but LLC knows two priorities, look for keyworkd retain in compiler manuals.