- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -

Mem256gb

From HLRS Platforms

large memory nodes

the 10 new large memory nodes are based on 4 socket AMD Opteron 6238 "interlagos" in the 12 core/2.6 Ghz incarnation.

main hardware features

  • 256GB of DDR3 1600MHz memory, >100GB/s bandwidth
  • 48 cores total

remarks

  • this 4 socket platform contains MultiChipModule (MCM) cpus, each CPU consists of 2 chips with 2 memory channels each, this results in 2 NUMA nodes per socket
  • with 8 NUMA nodes, this system is very NUMA placement sensitive
  • with 48 cores and 256GB it is a good system for shared memory applications, but task placement and numa placement are important, see NUMA Tuning for introduction.
  • thread pinning helps to improve performance, as Intel OpenMP runtime option KMP_AFFINITY is not (yet?) supported on AMD cpus, use likwid-pin instead
$./stream_avx 
[...]
Function     Rate (MB/s)  Avg time   Min time  Max time
Copy:        22724.1176        0.0149        0.0141        0.0158
Scale:       29406.6272        0.0132        0.0109        0.0142
Add:         22833.1340        0.0236        0.0210        0.0260
Triad:       22749.7957        0.0241        0.0211        0.0270

You can improve performance by factor 2-3 like this, assuming an intel compiler compiled openmp program:

$module load performance/likwid
$likwid-pin -t intel stream_avx
[...]
Function     Rate (MB/s)  Avg time   Min time  Max time
Copy:        89282.0648        0.0040        0.0036        0.0041
Scale:      125390.2541        0.0026        0.0026        0.0026
Add:        102237.7575        0.0049        0.0047        0.0051
Triad:      120663.2257        0.0040        0.0040        0.0040

This does not work:

$KMP_AFFINITY=scatter ./stream_avx
[...]
OMP: Warning #72: KMP_AFFINITY: affinity only supported for Intel(R) processors. 
OMP: Warning #71: KMP_AFFINITY: affinity not supported, using "disabled".
[...]