NEC Aurora HW

This documentation is deprecated as of 2025-01-02.

Hardware

64 NEC Aurora TSUBASA cards with a single vector processor (8 A300-8 with 8 VE type 10B each)
- 8 cores per processor/VE, 2.1 TFlops peak
- 1.4 GHz frequency
- 32 vector pipes per core, each doing 3 FMA per clock, resulting in 192 flops/cycle
- 64 vector registers
- little endian data formats
- Out of Order execution
- 256KB L2 cache per core
- 16MB shared LLC
- 48 GB HBM memory per node
- 6 HBM channels, for 1.2TB/s memory bandwidth
- 2xEDR 100GBit/s Infiniband links per VH

Software Architecture

The VE card does not run an OS, but only runs the user application the moment it is started, before and after it is stopped and does nothing.

All system calls are forwarded to the VH, filesystem accesses are therefor transparent.

The execution model allows processes and threads, but it does not favor thread oversubscription, it is not good practice and will yield bad performance to have more than one process or thread on a core.

It is a timesharing model, so it is possible to have several processes (even of different users) on a core, but it makes not really sense, timeslices are long and context switches are expensive.

In contrast to previous SX architecture, the data format is little endian, and compiler mimic gcc memory as good as possible, so binary files should be exchangable with files written with gcc compiled programs in x86-64.

Caches including LLC are coherent, LLC is writeback, all memory transfer is passing through LLC, but LLC knows two priorities, look for keyworkd retain in compiler manuals.

NEC Aurora HW

Hardware

Software Architecture

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools