- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

HPE Hawk: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
No edit summary
 
(143 intermediate revisions by 9 users not shown)
Line 1: Line 1:
'''Hawk''' is the next generation HPC system at HLRS. It will replace the existing [[Cray XC40|HazelHen]] system.
The installation is planed to take place in Q4 2019. For more detailed information see the [[Hawk installation schedule]].


This Page is under construction!
{{Note
| text = Please be sure to read at least the [[10_minutes_before_the_first_job]] document and consult the [[General HWW Documentation]] before you start to work with any of our systems.
}}


{{Warning
| text = In prepartion of the next generation supercomputer [[ Hunter_(HPE) | Hunter ]], the hardware configuration has been reduced (from 5632 compute nodes to 4096 compute nodes). Workspace filesystem ws10 has been removed.
}}


== Topics for Software Installation Meeting ==
* Installationsverzeichnis: /opt/hlrs; Platform unabhängig (Ausnahme): /sw/general; '''Links zwischen Platformen /sw/* dürfen nicht mehr verwendet werden'''
* Dokumentation der Installation: für User: platforms-Wiki; für Installer: staff-Wiki
* how to inform users about software updates in the future: keine Verbindung von platform ins staff wiki
MOTD?; Mailingliste schlank halten; muss automatisch sein (siehe module-changes email); wer lädt wann welche module -> relevante emails verschicken
* hands-on sit (software installation tool)
* template for modulefiles; see [[#Modulefile best practices]]
* hierarchy of modulefiles [[#Hierarchy of modules]]
* default compiler / MPI / others
* how to deal with software from base linux which should not be visible on cluster, i.e. system compiler (gcc 4.8.5)
* install location of python bindings


== Access ==
----


Login-Node: hawk-tds-login2.hww.hlrs.de
{{note|text=Access to the Hawk TDS is limited to support staff at the moment. Please check the [[Hawk installation schedule]] for details about the start of user access.}}


== Batch System ==
{| style="border:0; margin: 0;" width="100%" cellspacing="10"


[[Batch_System_PBSPro_(Hawk)|Batch System PBSPro (Hawk)]]
| valign="top" style="padding: 0; border: 1px solid #aaaaaa; margin-bottom: 0;" |
<div style="font-size: 105%; padding: 0.4em; background-color: #eeeeee; border-bottom: 1px solid #aaaaaa; text-align: center;">'''Introduction'''</div>
<div style="background: #ffffff; padding:0.2em 0.4em;">
{| style="border: 0; margin: 0;" cellpadding="3"
| valign="top" | 
<!-- * [[Hawk_installation_schedule#Terms_of_Use | Terms of use ]] -->
* [[HPE_Hawk_access|Access]]
* [[HPE_Hawk_Hardware_and_Architecture|Hardware and Architecture]]
|}
</div>




== MPI ==


In order to use the MPI implementation provided by HPE, please load the Message Passing Toolkit (MPT) module ''mpt'' (not ABI-compatible to other MPI implementations) or ''hmpt'' (ABI-compatible to MPICH-derivatives).
| valign="top" style="padding: 0; border: 1px solid #aaaaaa; margin-bottom: 0;" |
For detailed information see the [http://www.hpe.com/support/mpi-ug-036 HPE Message Passing Interface (MPI) User Guide].
<div style="font-size: 105%; padding: 0.4em; background-color: #eeeeee; border-bottom: 1px solid #aaaaaa; text-align: center;">'''Troubleshooting'''</div>
<div style="background: #ffffff; padding:0.2em 0.4em;">
{| style="border: 0; margin: 0;" cellpadding="3"
| valign="top" | 
* [[HPE_Hawk_Support|Support (contact/staff)]]
* [[HPE_Hawk_FAQ|FAQ]]
* [http://websrv.hlrs.de/cgi-bin/hwwweather?task=viewmachine&machine=hawk Status,Maintenance for hawk]
* [[HPE_Hawk_News|News]]
|}
</div>


<br>
== Modulefile best practices ==


* Set an environment variable to the root path of your installation (cf. e.g. MPI_ROOT in /usr/share/Modules/modulefiles/hmpt/2.19).
|}
* Set not only CPATH but also respective variables used by PGI / Intel / etc. -> someone has to figure out the list of those variables, same w.r.t. (LD_)LiBRARY_PATH.
* Include your Name (finger does not work anymore), E-Mail and date of installation into the modulefile.
* It's possible to hold the modulefile(s) together with the actual installation in the respective directory and just create symlinks in /opt/hlrs/modulefiles/.
* Directory structure of /opt/hlrs/ shall be replicated in /opt/hlrs/unsupported-modulefiles/.
* In case of dependencies, load explicit versions instead of default one!


As an example a modulefile for package "foo" version 1.23 (within the category "performance") should look like:
#%Module1.0
#
# Change log:
#  Updated  12 Jul 2019, Christoph Niethammer <niethammer@hlrs.de>
#  Installed 08 Aug 2018, Jose Gracia <gracia@hlrs.de>
BASE_DIR=/opt/hlrs/
CAT=performance
PACKAGE=Foo
VERSION=1.23
FOO_ROOT=$BASE_DIR/$CAT/$PACKAGE/$VERSION
setenv FOO_ROOT $FOO_ROOT
setenv FOO_VERSION $VERSION
prepend-path PATH                $FOO_ROOT/bin
prepend-path LD_LIBRARY_PATH    $FOO_ROOT/lib        # library search path at time of execution (i.e. in case of _dynamic_ linking)
prepend-path LIBRARY_PATH        $FOO_ROOT/lib        # equivalent of "-L" for C, C++, and Fortran
prepend-path CPATH              $FOO_ROOT/include    # equivalent to "-I" for C, C++ and Fortran
prepend-path CPLUS_INCLUDE_PATH  $FOO_ROOT/include    # equivalent to "-I" only for C++ compiler; usually not needed as CPATH will do
prepend-path C_INCLUDE_PATH      $FOO_ROOT/include    # equivalent to "-I" only for C compiler; usually not needed as CPATH will do
prepend-path MANPATH            $FOO_ROOT/share/man  # manpages


module-whatis "_brief_ description of what is provided by this module"


== Hierarchy of modules ==
On Vulcan and Hazel Hen we have various module directories like ''tools'', ''utils'', ''misc''. However, it is not very clear what should go here; at the end anything is a tool. I would therefore propose to be more specific.


Proposal for module hierarchy (please extend)


development/    # development tools
{| style="border:0; margin: 0;" width="100%" cellspacing="10"
    # svn, git, binutils, cmake
mpi/            # nobody will suspect MPI under mpt/
    # mpt, hmpt, ....
compiler/
    # gcc, oacc, intel, pgi, ...
numlib/        # numerical libraries
    # mkl, trillinos, ...
debugger/      # debugging tools
    # forge, ...
performance/    # performance analysis tools
    # vampir, extrae, scalasca, inspector, advisor, darshan, ...
visualization/  # data visualization tools
    # paraview, ...
python/
    # 3.X; do we need/want 2.7?


What to do with libraries which are not necessarily "numlib", e.g.
| valign="top" style="padding: 0; border: 1px solid #aaaaaa; margin-bottom: 0;" |
boost/ -> libraries/boost
<div style="font-size: 105%; padding: 0.4em; background-color: #eeeeee; border-bottom: 1px solid #aaaaaa; text-align: center;">'''Documentation'''</div>
  hdf5/  -> libraries/hdf5? io/hdf5?
<div style="background: #ffffff; padding:0.2em 0.4em;">
{| style="border: 0; margin: 0;" cellpadding="3"
| valign="top" |  
* [[Batch_System_PBSPro_(Hawk)|Batch System]]
* [[Module environment(Hawk)|Module Environment]]
* [[Storage_(Hawk)| Storage Description ]]
* [[Compiler(Hawk)|Compiler]]
* [[MPI(Hawk)|MPI]]
* [[Libraries(Hawk)|Libraries]]
* [[Manuals(Hawk)|Manuals]]
* [[Optimization|Optimization]]
* [[Hawk_PrePostProcessing|Pre- and Post-Processing]]
* [[Big_Data,_AI_Aplications_and_Frameworks|Big Data, AI Applications and Frameworks]]
* [[Performance Analysis Tools]]
* [[CPE|Cray Programming Environment (CPE)]]


Libraries which are actually some kind of programming model:
|}
gpi2    -> libraries/gpi2
</div>
tbb    -> libraries/tbb


What to do with software for projects? Such as
tools/hidalgo/fenics_hpc/3dairq
prace/prace
Put them in directories which are readable only for a certain group?


== Test cases best practices ==


Test cases will help to identify and determine the scaling/ performance behavior of the new system.
| valign="top" style="padding: 0; border: 1px solid #aaaaaa; margin-bottom: 0;" |
Ideally, those test cases can be compared to other systems as well to get a full picture.
<div style="font-size: 105%; padding: 0.4em; background-color: #eeeeee; border-bottom: 1px solid #aaaaaa; text-align: center;">'''Utilities'''</div>
<div style="background: #ffffff; padding:0.2em 0.4em;">
{| style="border: 0; margin: 0;" cellpadding="3"
| valign="top" | 
* [[CAE_utilities|CAE Utilities]]
* [[CAE_howtos|CAE HOWTOs]]
* [[MKL | MKL Fortran Interfaces ]]
|}
</div>


To do:
|}
* Definition of a best practice guideline on how to set up a correct test case
** Only measurement of time-stepping loop or equivalent excluding the initialization phase or cleanup
** Well defined measure of computational progress (e.g. LUPS, DoF-UPS, Iterations/s or Flop/s)
** Ideally, the test case is mostly automated with scripts and does also the evaluation on top with a meaningful result file




== TODO ==
----
 
[[Help | Help for Wiki Usage]]
* 2019-08-22, niethammer@hlrs.de: missing pbs headers (tm.h, ...)
* 2019-08-18, dick@hlrs.de: (exuberant) ctags missing on frontend, probably available from RHEL repository
* 2019-08-18, dick@hlrs.de: manpages are missing on the frontend
* 2019-08-15, niethammer@hlrs.de: need more explanation on how ''omplace'' works for pinning in the context of SMT (numbering of cores?)
* 2019-08-15, niethammer@hlrs.de: how to run correctly scripts/wrappers with mpirun? (executes script only once per node, but MPI application if called inside multiple times)
 
* 2019-08-15, niethammer@hlrs.de: missing commands:
**''resize'' (likely coming with xterm)
 
* 2019-08-15, niethammer@hlrs.de:
** MPT = Message Passing Toolkit
** MPI from the mpt module uses the SGI ABI, MPI from the hmpt module uses the MPICH ABI
** for the MPI compiler wrappers to detect the correct compiler please set MPICC_CC, MPICXX_CXX, MPIF90_F90, MPIF08_F08 to the corresponding compiler commands (2019-08-15, dick@hlrs.de: done)
** Should e.g. applications using cae/platform_mpi use perfboost?
 
* 2019-08-07, dick@hlrs: hmpt is ABI-compatible with MPICH-derivativs, but not so mpt
** user should know about this!
** @HPE: is hmpt a MPICH-derivative, but not so mpt?
 
* 2019-08-07, dick@hlrs: unclear that (h)mpt provides MPI lib -> call it "mpi/hmpt" and "mpi/mpt" instead?
 
* 2019-08-07, dick@hlrs: remove MPI delivered with RHEL
 
* 2019-08-07, dick@hlrs: Intel loads gcc module -> use (LD_)LIBRARY_PATH intead
 
* 2019-08-07, dick@hlrs: be careful: cc points to /usr/bin/gcc!
 
* 2019-08-14, khabi/offenhaeuser@hlrs.de: How to pin OpenMP-threads in hybrid jobs (naive approach pins 2 threads to _same_ core instead of two different), i.e.: how to do aprun -d?
 
<br>
 
== Training ==
 
There will be internal (i.e. HLRS staff only) trainings on the following topics (tentative):
 
* HPE Performance MPI
** [https://terminplaner4.dfn.de/qTagd9VodYiy89mj survey to set date]
** topics available via the above link
** target audience: user support staff, internal users of the system
 
* Processor
** [https://terminplaner4.dfn.de/COB6iw5DAFFyDtwe survey to set date]
** (tentative) schedule available via the above link
** target audience: user support staff, internal users of the system
 
* Workload Management PBSpro for end users
** probably one day in week 2019-11-11 tp 2019-11-15
** target audience: user support staff, internal users of the system
 
* Cluster and System Administration using HPCM
** target audience: admin staff
 
* Infiniband-Administration and Tuning
** target audience: admin staff
 
* Lustre and Storage Administration
** target audience: admin staff
 
* Workload Management PBSpro Administration and Tuning
** target audience: admin staff

Latest revision as of 08:44, 25 October 2024

Note: Please be sure to read at least the 10_minutes_before_the_first_job document and consult the General HWW Documentation before you start to work with any of our systems.


Warning: In prepartion of the next generation supercomputer Hunter , the hardware configuration has been reduced (from 5632 compute nodes to 4096 compute nodes). Workspace filesystem ws10 has been removed.




Introduction


Troubleshooting




Documentation


Utilities



Help for Wiki Usage