- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Sb: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
(Created page with " == Sandy Bridge nodes == The cluster was upgraded to contain 192 nodes with dual socket Intel Xeon E5-2670, 2.6 GHz "Sandy Bridge". * 16 cores per node in 2 sockets, 32 thr...")
 
No edit summary
Line 27: Line 27:
** for openmpi use mpirun -bind-to-core
** for openmpi use mpirun -bind-to-core
** for HP-MPI/Platform MPI use mpirun -cpu_bind=rank
** for HP-MPI/Platform MPI use mpirun -cpu_bind=rank
** for intel MPI use mpirun //TODO
** for intel MPI pinning is on by default

Revision as of 17:12, 6 August 2012

Sandy Bridge nodes

The cluster was upgraded to contain 192 nodes with dual socket Intel Xeon E5-2670, 2.6 GHz "Sandy Bridge".

  • 16 cores per node in 2 sockets, 32 threads, AVX support
  • 32GB of DDR3 1600 Mhz memory
  • 4 memory channels per CPU, total of >70GB/s memory bandwidth
  • Mellanox QDR ConnectX-3 Infiniband HCA, connected with PCIe-gen3 bus, 2:1 overcommitted in the switch fabric


main user benefits through hardware compared to Nehalem nodes

  • twice the core number per node
  • twice the memory bandwidth per node
  • higher IB bandwidth and throughput
  • 2.66 times the memory per node, 30% more memory per core
  • ~same CPU throughput per core with 2.6 GHz instead of 2.8 GHz.

new OS version offers

  • AVX support (4 DP element vector instructions)
  • transparent huge pages (less TLB misses for large memory blocks without any need to change code)

Remarks

  • use -mAVX switches to generate best code with compilers, AVX is supported by Intel and GCC and ortland compilers, see details in manuals
  • frontends are nehalem type CPUs, if compiler uses autodetection if no architecture switch is specified, you will get non-optimal code!
  • redhat 6.2 based scientific linux 6.2 which is used on the new nodes shows performance degradation if SMT (hyperthreading) is enabled - which is the case - and MPI is used without using the additional threads. To get best performance, use CPU pinning.
    • for openmpi use mpirun -bind-to-core
    • for HP-MPI/Platform MPI use mpirun -cpu_bind=rank
    • for intel MPI pinning is on by default