- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

NEC Aurora HW: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
No edit summary
No edit summary
Line 13: Line 13:
** 6 HBM channels, for 1.2TB/s memory bandwidth
** 6 HBM channels, for 1.2TB/s memory bandwidth
** 2xEDR 100GBit/s Infiniband links per VH
** 2xEDR 100GBit/s Infiniband links per VH
=== Software Architecture ===
The VE card does not run an OS, but only runs the user application the moment it is started,
before and after it is stopped and does nothing.
All system calls are forwarded to the VH, filesystem accesses are therefor transparent.
The execution model allows processes and threads, but it does not favor thread oversubscription,
it is not good practice and will yield bad performance to have more than one process or thread on a core.
It is a timesharing model, so it is possible to have several processes (even of different users) on a core,
but it makes not really sense, timeslices are long and context switches are expensive.
In contrast to previous SX architecture, the data format is little endian, and compiler mimic gcc memory as
good as possible, so binary files should be exchangable with files written with gcc compiled programs in x86-64.
Caches including LLC are coherent, LLC is writeback, all memory transfer is passing through LLC.

Revision as of 12:25, 8 January 2020

Hardware

  • 64 NEC Aurora TSUBASA nodes with a single vector processor
    • 8 cores per processor/VE, 2.1 TFlops peak
    • 1.4 GHz frequency
    • 32 vector pipes per core, each doing 3 FMA per clock, resulting in 192 flops/cycle
    • 64 vector registers
    • little endian data formats
    • Out of Order execution
    • 256KB L2 cache per core
    • 16MB shared LLC
    • 48 GB HBM memory per node
    • 6 HBM channels, for 1.2TB/s memory bandwidth
    • 2xEDR 100GBit/s Infiniband links per VH

Software Architecture

The VE card does not run an OS, but only runs the user application the moment it is started, before and after it is stopped and does nothing.

All system calls are forwarded to the VH, filesystem accesses are therefor transparent.

The execution model allows processes and threads, but it does not favor thread oversubscription, it is not good practice and will yield bad performance to have more than one process or thread on a core.

It is a timesharing model, so it is possible to have several processes (even of different users) on a core, but it makes not really sense, timeslices are long and context switches are expensive.

In contrast to previous SX architecture, the data format is little endian, and compiler mimic gcc memory as good as possible, so binary files should be exchangable with files written with gcc compiled programs in x86-64.

Caches including LLC are coherent, LLC is writeback, all memory transfer is passing through LLC.