- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Sb

From HLRS Platforms
Revision as of 14:31, 6 August 2012 by Hwwnec5 (talk | contribs) (Created page with " == Sandy Bridge nodes == The cluster was upgraded to contain 192 nodes with dual socket Intel Xeon E5-2670, 2.6 GHz "Sandy Bridge". * 16 cores per node in 2 sockets, 32 thr...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Sandy Bridge nodes

The cluster was upgraded to contain 192 nodes with dual socket Intel Xeon E5-2670, 2.6 GHz "Sandy Bridge".

  • 16 cores per node in 2 sockets, 32 threads, AVX support
  • 32GB of DDR3 1600 Mhz memory
  • 4 memory channels per CPU, total of >70GB/s memory bandwidth
  • Mellanox QDR ConnectX-3 Infiniband HCA, connected with PCIe-gen3 bus, 2:1 overcommitted in the switch fabric


main user benefits through hardware compared to Nehalem nodes

  • twice the core number per node
  • twice the memory bandwidth per node
  • higher IB bandwidth and throughput
  • 2.66 times the memory per node, 30% more memory per core
  • ~same CPU throughput per core with 2.6 GHz instead of 2.8 GHz.

new OS version offers

  • AVX support (4 DP element vector instructions)
  • transparent huge pages (less TLB misses for large memory blocks without any need to change code)

Remarks

  • use -mAVX switches to generate best code with compilers, AVX is supported by Intel and GCC and ortland compilers, see details in manuals
  • frontends are nehalem type CPUs, if compiler uses autodetection if no architecture switch is specified, you will get non-optimal code!
  • redhat 6.2 based scientific linux 6.2 which is used on the new nodes shows performance degradation if SMT (hyperthreading) is enabled - which is the case - and MPI is used without using the additional threads. To get best performance, use CPU pinning.
    • for openmpi use mpirun -bind-to-core
    • for HP-MPI/Platform MPI use mpirun -cpu_bind=rank
    • for intel MPI use mpirun //TODO