- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

HPE Hawk: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
No edit summary
 
(136 intermediate revisions by 8 users not shown)
Line 1: Line 1:
'''Hawk''' is the next generation HPC system at HLRS. It will replace the existing [[Cray XC40|HazelHen]] system.
The installation is planed to take place in Q4 2019. For more detailed information see the [[Hawk installation schedule]].


This Page is under construction!
{{Note
| text = Please be sure to read at least the [[10_minutes_before_the_first_job]] document and consult the [[General HWW Documentation]] before you start to work with any of our systems.
}}


== Best Practises for Software Installation ==
{{Warning
Best practices for software installation on Hawk is describe on a separate wiki page [[Hawk software installation]].
| text = In prepartion of the next generation supercomputer [[ Hunter_(HPE) | Hunter ]], the hardware configuration has been reduced (from 5632 compute nodes to 4096 compute nodes). Workspace filesystem ws10 has been removed.
}}


== Access ==


Login-Node: hawk-tds-login2.hww.hlrs.de
----
{{note|text=Access to the Hawk TDS is limited to support staff at the moment. Please check the [[Hawk installation schedule]] for details about the start of user access.}}


== Batch System ==


[[Batch_System_PBSPro_(Hawk)|Batch System PBSPro (Hawk)]]
{| style="border:0; margin: 0;" width="100%" cellspacing="10"


| valign="top" style="padding: 0; border: 1px solid #aaaaaa; margin-bottom: 0;" |
<div style="font-size: 105%; padding: 0.4em; background-color: #eeeeee; border-bottom: 1px solid #aaaaaa; text-align: center;">'''Introduction'''</div>
<div style="background: #ffffff; padding:0.2em 0.4em;">
{| style="border: 0; margin: 0;" cellpadding="3"
| valign="top" | 
<!-- * [[Hawk_installation_schedule#Terms_of_Use | Terms of use ]] -->
* [[HPE_Hawk_access|Access]]
* [[HPE_Hawk_Hardware_and_Architecture|Hardware and Architecture]]
|}
</div>


== MPI ==


In order to use the MPI implementation provided by HPE, please load the Message Passing Toolkit (MPT) module ''mpt'' (not ABI-compatible to other MPI implementations) or ''hmpt'' (ABI-compatible to MPICH-derivatives).
For detailed information see the [http://www.hpe.com/support/mpi-ug-036 HPE Message Passing Interface (MPI) User Guide].


<br>
| valign="top" style="padding: 0; border: 1px solid #aaaaaa; margin-bottom: 0;" |
<div style="font-size: 105%; padding: 0.4em; background-color: #eeeeee; border-bottom: 1px solid #aaaaaa; text-align: center;">'''Troubleshooting'''</div>
<div style="background: #ffffff; padding:0.2em 0.4em;">
{| style="border: 0; margin: 0;" cellpadding="3"
| valign="top" | 
* [[HPE_Hawk_Support|Support (contact/staff)]]
* [[HPE_Hawk_FAQ|FAQ]]
* [http://websrv.hlrs.de/cgi-bin/hwwweather?task=viewmachine&machine=hawk Status,Maintenance for hawk]
* [[HPE_Hawk_News|News]]
|}
</div>


== Test cases best practices ==


Test cases will help to identify and determine the scaling/ performance behavior of the new system.
|}
Ideally, those test cases can be compared to other systems as well to get a full picture.


To do:
* Definition of a best practice guideline on how to set up a correct test case
** Only measurement of time-stepping loop or equivalent excluding the initialization phase or cleanup
** Well defined measure of computational progress (e.g. LUPS, DoF-UPS, Iterations/s or Flop/s)
** Ideally, the test case is mostly automated with scripts and does also the evaluation on top with a meaningful result file




== TODO ==


* 2019-08-22, niethammer@hlrs.de: missing pbs headers (tm.h, ...)
* 2019-08-18, dick@hlrs.de: (exuberant) ctags missing on frontend, probably available from RHEL repository
* 2019-08-18, dick@hlrs.de: manpages are missing on the frontend
* 2019-08-15, niethammer@hlrs.de: need more explanation on how ''omplace'' works for pinning in the context of SMT (numbering of cores?)
* 2019-08-15, niethammer@hlrs.de: how to run correctly scripts/wrappers with mpirun? (executes script only once per node, but MPI application if called inside multiple times)


* 2019-08-15, niethammer@hlrs.de: missing commands:
{| style="border:0; margin: 0;" width="100%" cellspacing="10"
**''resize'' (likely coming with xterm)


* 2019-08-15, niethammer@hlrs.de:
| valign="top" style="padding: 0; border: 1px solid #aaaaaa; margin-bottom: 0;" |
** MPT = Message Passing Toolkit
<div style="font-size: 105%; padding: 0.4em; background-color: #eeeeee; border-bottom: 1px solid #aaaaaa; text-align: center;">'''Documentation'''</div>
** MPI from the mpt module uses the SGI ABI, MPI from the hmpt module uses the MPICH ABI
<div style="background: #ffffff; padding:0.2em 0.4em;">
** for the MPI compiler wrappers to detect the correct compiler please set MPICC_CC, MPICXX_CXX, MPIF90_F90, MPIF08_F08 to the corresponding compiler commands (2019-08-15, dick@hlrs.de: done)
{| style="border: 0; margin: 0;" cellpadding="3"
** Should e.g. applications using cae/platform_mpi use perfboost?
| valign="top" | 
* [[Batch_System_PBSPro_(Hawk)|Batch System]]
* [[Module environment(Hawk)|Module Environment]]
* [[Storage_(Hawk)| Storage Description ]]
* [[Compiler(Hawk)|Compiler]]
* [[MPI(Hawk)|MPI]]
* [[Libraries(Hawk)|Libraries]]
* [[Manuals(Hawk)|Manuals]]
* [[Optimization|Optimization]]
* [[Hawk_PrePostProcessing|Pre- and Post-Processing]]
* [[Big_Data,_AI_Aplications_and_Frameworks|Big Data, AI Applications and Frameworks]]
* [[Performance Analysis Tools]]
* [[CPE|Cray Programming Environment (CPE)]]


* 2019-08-07, dick@hlrs: hmpt is ABI-compatible with MPICH-derivativs, but not so mpt
|}
** user should know about this!
</div>
** @HPE: is hmpt a MPICH-derivative, but not so mpt?


* 2019-08-07, dick@hlrs: unclear that (h)mpt provides MPI lib -> call it "mpi/hmpt" and "mpi/mpt" instead?


* 2019-08-07, dick@hlrs: remove MPI delivered with RHEL


* 2019-08-07, dick@hlrs: Intel loads gcc module -> use (LD_)LIBRARY_PATH intead
| valign="top" style="padding: 0; border: 1px solid #aaaaaa; margin-bottom: 0;" |
<div style="font-size: 105%; padding: 0.4em; background-color: #eeeeee; border-bottom: 1px solid #aaaaaa; text-align: center;">'''Utilities'''</div>
<div style="background: #ffffff; padding:0.2em 0.4em;">
{| style="border: 0; margin: 0;" cellpadding="3"
| valign="top" | 
* [[CAE_utilities|CAE Utilities]]
* [[CAE_howtos|CAE HOWTOs]]
* [[MKL | MKL Fortran Interfaces ]]
|}
</div>


* 2019-08-07, dick@hlrs: be careful: cc points to /usr/bin/gcc!
|}


* 2019-08-14, khabi/offenhaeuser@hlrs.de: How to pin OpenMP-threads in hybrid jobs (naive approach pins 2 threads to _same_ core instead of two different), i.e.: how to do aprun -d?


<br>
----
 
[[Help | Help for Wiki Usage]]
== Training ==
 
There will be internal (i.e. HLRS staff only) trainings on the following topics (tentative):
 
* HPE Performance MPI
** [https://terminplaner4.dfn.de/qTagd9VodYiy89mj survey to set date]
** topics available via the above link
** target audience: user support staff, internal users of the system
 
* Processor
** [https://terminplaner4.dfn.de/COB6iw5DAFFyDtwe survey to set date]
** (tentative) schedule available via the above link
** target audience: user support staff, internal users of the system
 
* Workload Management PBSpro for end users
** probably one day in week 2019-11-11 tp 2019-11-15
** target audience: user support staff, internal users of the system
 
* Cluster and System Administration using HPCM
** target audience: admin staff
 
* Infiniband-Administration and Tuning
** target audience: admin staff
 
* Lustre and Storage Administration
** target audience: admin staff
 
* Workload Management PBSpro Administration and Tuning
** target audience: admin staff

Latest revision as of 08:44, 25 October 2024

Note: Please be sure to read at least the 10_minutes_before_the_first_job document and consult the General HWW Documentation before you start to work with any of our systems.


Warning: In prepartion of the next generation supercomputer Hunter , the hardware configuration has been reduced (from 5632 compute nodes to 4096 compute nodes). Workspace filesystem ws10 has been removed.




Introduction


Troubleshooting




Documentation


Utilities



Help for Wiki Usage