- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -

Difference between revisions of "HPE Hawk"

From HLRS Platforms
Jump to navigationJump to search
 
(141 intermediate revisions by 6 users not shown)
Line 1: Line 1:
'''Hawk''' is the next generation HPC system at HLRS. It will replace the existing [[Cray XC40|HazelHen]] system.
 
The installation is planed to take place in Q4 2019. For more detailed information see the [[Hawk installation schedule]].
 
  
This Page is under construction!
+
{{Note
 +
| text = Please be sure to read at least the [[10_minutes_before_the_first_job]] document and consult the [[General HWW Documentation]] before you start to work with any of our systems.
 +
}}
  
 +
----
  
== Topics for Software Installation Meeting ==
 
  
* hands-on sit (software installation tool)
+
{| style="border:0; margin: 0;" width="100%" cellspacing="10"
* template for modulefiles; see [[#Modulefile best practices]]
 
* hierarchy of modulefiles [[#Hierarchy of modules]]
 
* default compiler / MPI / others
 
* how to deal with software from base linux which should not be visible on cluster, i.e. system compiler (gcc 4.8.5)
 
* how to inform users about software updates in the future
 
  
== Access ==
+
| valign="top" style="padding: 0; border: 1px solid #aaaaaa; margin-bottom: 0;" |
 +
<div style="font-size: 105%; padding: 0.4em; background-color: #eeeeee; border-bottom: 1px solid #aaaaaa; text-align: center;">'''Introduction'''</div>
 +
<div style="background: #ffffff; padding:0.2em 0.4em;">
 +
{| style="border: 0; margin: 0;" cellpadding="3"
 +
| valign="top" | 
 +
* [[Hawk_installation_schedule#Terms_of_Use | Terms of use ]]
 +
* [[HPE_Hawk_access|Access]]
 +
* [[HPE_Hawk_Hardware_and_Architecture|Hardware and Architecture]]
 +
|}
 +
</div>
  
Login-Node: hawk-tds-login2.hww.hlrs.de
 
{{note|text=Access to the Hawk TDS is limited to support staff at the moment. Please check the [[Hawk installation schedule]] for details about the start of user access.}}
 
  
== Batch System ==
 
  
[[Batch_System_PBSPro_(Hawk)|Batch System PBSPro (Hawk)]]
+
| valign="top" style="padding: 0; border: 1px solid #aaaaaa; margin-bottom: 0;" |
 +
<div style="font-size: 105%; padding: 0.4em; background-color: #eeeeee; border-bottom: 1px solid #aaaaaa; text-align: center;">'''Troubleshooting'''</div>
 +
<div style="background: #ffffff; padding:0.2em 0.4em;">
 +
{| style="border: 0; margin: 0;" cellpadding="3"
 +
| valign="top" | 
 +
* [[HPE_Hawk_Support|Support (contact/staff)]]
 +
* [[HPE_Hawk_FAQ|FAQ]]
 +
* [http://websrv.hlrs.de/cgi-bin/hwwweather?task=viewmachine&machine=hawk Status,Maintenance for hawk]
 +
* [[HPE_Hawk_News|News]]
 +
|}
 +
</div>
  
  
== MPI ==
+
|}
  
In order to use the MPI implementation provided by HPE, please load the Message Passing Toolkit (MPT) module ''mpt'' (not ABI-compatible to other MPI implementations) or ''hmpt'' (ABI-compatible to MPICH-derivatives).
 
For detailed information see the [http://www.hpe.com/support/mpi-ug-036 HPE Message Passing Interface (MPI) User Guide].
 
  
<br>
 
== Modulefile best practices ==
 
  
* Set an environment variable to the root path of your installation (cf. e.g. MPI_ROOT in /usr/share/Modules/modulefiles/hmpt/2.19).
 
* Set not only CPATH but also respective variables used by PGI / Intel / etc. -> someone has to figure out the list of those variables, same w.r.t. (LD_)LiBRARY_PATH.
 
* Include your Name (finger does not work anymore), E-Mail and date of installation into the modulefile.
 
* It's possible to hold the modulefile(s) together with the actual installation in the respective directory and just create symlinks in /opt/hlrs/modulefiles/.
 
* Directory structure of /opt/hlrs/ shall be replicated in /opt/hlrs/unsupported-modulefiles/.
 
* In case of dependencies, load explicit versions instead of default one!
 
  
As an example a modulefile for package "foo" version 1.23 (within the category "performance") should look like:
 
 
#%Module1.0
 
#
 
# Change log:
 
#  Updated  12 Jul 2019, Christoph Niethammer <niethammer@hlrs.de>
 
#  Installed 08 Aug 2018, Jose Gracia <gracia@hlrs.de>
 
 
BASE_DIR=/opt/hlrs/
 
CAT=performance
 
PACKAGE=Foo
 
VERSION=1.23
 
 
FOO_ROOT=$BASE_DIR/$CAT/$PACKAGE/$VERSION
 
 
setenv FOO_ROOT $FOO_ROOT
 
setenv FOO_VERSION $VERSION
 
 
prepend-path PATH                $FOO_ROOT/bin
 
prepend-path LD_LIBRARY_PATH    $FOO_ROOT/lib        # library search path at time of execution (i.e. in case of _dynamic_ linking)
 
prepend-path LIBRARY_PATH        $FOO_ROOT/lib        # equivalent of "-L" for C, C++, and Fortran
 
prepend-path CPATH              $FOO_ROOT/include    # equivalent to "-I" for C, C++ and Fortran
 
prepend-path CPLUS_INCLUDE_PATH  $FOO_ROOT/include    # equivalent to "-I" only for C++ compiler; usually not needed as CPATH will do
 
prepend-path C_INCLUDE_PATH      $FOO_ROOT/include    # equivalent to "-I" only for C compiler; usually not needed as CPATH will do
 
prepend-path MANPATH            $FOO_ROOT/share/man  # manpages
 
  
== Hierarchy of modules ==
+
{| style="border:0; margin: 0;" width="100%" cellspacing="10"
On Vulcan and Hazel Hen we have various module directories like ''tools'', ''utils'', ''misc''. However, it is not very clear what should go here; at the end anything is a tool. I would therefore propose to be more specific.
 
  
Proposal for module hierarchy (please extend)
+
| valign="top" style="padding: 0; border: 1px solid #aaaaaa; margin-bottom: 0;" |
 +
<div style="font-size: 105%; padding: 0.4em; background-color: #eeeeee; border-bottom: 1px solid #aaaaaa; text-align: center;">'''Documentation'''</div>
 +
<div style="background: #ffffff; padding:0.2em 0.4em;">
 +
{| style="border: 0; margin: 0;" cellpadding="3"
 +
| valign="top" | 
 +
* [[Batch_System_PBSPro_(Hawk)|Batch System]]
 +
* [[Module environment(Hawk)|Module Environment]]
 +
* [[Storage_(Hawk)| Storage Description ]]
 +
* [[Compiler(Hawk)|Compiler]]
 +
* [[MPI(Hawk)|MPI]]
 +
* [[Libraries(Hawk)|Libraries]]
 +
* [[Manuals(Hawk)|Manuals]]
 +
* [[Optimization|Optimization]]
 +
|}
 +
</div>
  
development/    # development tools
 
    # svn, git, binutils, cmake
 
 
mpi/            # nobody will suspect MPI under mpt/
 
    # mpt, hmpt, ....
 
 
compiler/
 
    # gcc, oacc, intel, pgi, ...
 
 
numlib/        # numerical libraries
 
    # mkl, trillinos, ...
 
 
debugger/      # debugging tools
 
    # forge, ...
 
 
performance/    # performance analysis tools
 
    # vampir, extrae, scalasca, inspector, advisor, darshan, ...
 
 
visualization/  # data visualization tools
 
    # paraview, ...
 
 
python/
 
    # 3.X; do we need/want 2.7?
 
  
What to do with libraries which are not necessarily "numlib", e.g.
 
boost/ -> libraries/boost
 
hdf5/  -> libraries/hdf5
 
  
Libraries which are actually some kind of programming model:
+
| valign="top" style="padding: 0; border: 1px solid #aaaaaa; margin-bottom: 0;" |
gpi2    -> libraries/gpi2
+
<div style="font-size: 105%; padding: 0.4em; background-color: #eeeeee; border-bottom: 1px solid #aaaaaa; text-align: center;">'''Utilities'''</div>
  tbb    -> libraries/tbb
+
<div style="background: #ffffff; padding:0.2em 0.4em;">
 +
{| style="border: 0; margin: 0;" cellpadding="3"
 +
| valign="top" |  
 +
* [[CAE_utilities|CAE Utilities]]
 +
* [[MKL | MKL Fortran Interfaces ]]
 +
|}
 +
</div>
  
What to do with software for projects? Such as
+
|}
tools/hidalgo/fenics_hpc/3dairq
 
prace/prace
 
Put them in directories which are readable only for a certain group?
 
  
== Test cases best practices ==
 
  
Test cases will help to identify and determine the scaling/ performance behavior of the new system.
+
----
Ideally, those test cases can be compared to other systems as well to get a full picture.
+
[[Help | Help for Wiki Usage]]
 
 
To do:
 
* Definition of a best practice guideline on how to set up a correct test case
 
** Only measurement of time-stepping loop or equivalent excluding the initialization phase or cleanup
 
** Well defined measure of computational progress (e.g. LUPS, DoF-UPS, Iterations/s or Flop/s)
 
** Ideally, the test case is mostly automated with scripts and does also the evaluation on top with a meaningful result file
 
 
 
 
 
== TODO ==
 
 
 
* 2019-08-22, niethammer@hlrs.de: missing pbs headers (tm.h, ...)
 
* 2019-08-18, dick@hlrs.de: (exuberant) ctags missing on frontend, probably available from RHEL repository
 
* 2019-08-18, dick@hlrs.de: manpages are missing on the frontend
 
* 2019-08-15, niethammer@hlrs.de: need more explanation on how ''omplace'' works for pinning in the context of SMT (numbering of cores?)
 
* 2019-08-15, niethammer@hlrs.de: how to run correctly scripts/wrappers with mpirun? (executes script only once per node, but MPI application if called inside multiple times)
 
 
 
* 2019-08-15, niethammer@hlrs.de: missing commands:
 
**''resize'' (likely coming with xterm)
 
 
 
* 2019-08-15, niethammer@hlrs.de:
 
** MPT = Message Passing Toolkit
 
** MPI from the mpt module uses the SGI ABI, MPI from the hmpt module uses the MPICH ABI
 
** for the MPI compiler wrappers to detect the correct compiler please set MPICC_CC, MPICXX_CXX, MPIF90_F90, MPIF08_F08 to the corresponding compiler commands (2019-08-15, dick@hlrs.de: done)
 
** Should e.g. applications using cae/platform_mpi use perfboost?
 
 
 
* 2019-08-07, dick@hlrs: hmpt is ABI-compatible with MPICH-derivativs, but not so mpt
 
** user should know about this!
 
** @HPE: is hmpt a MPICH-derivative, but not so mpt?
 
 
 
* 2019-08-07, dick@hlrs: unclear that (h)mpt provides MPI lib -> call it "mpi/hmpt" and "mpi/mpt" instead?
 
 
 
* 2019-08-07, dick@hlrs: remove MPI delivered with RHEL
 
 
 
* 2019-08-07, dick@hlrs: Intel loads gcc module -> use (LD_)LIBRARY_PATH intead
 
 
 
* 2019-08-07, dick@hlrs: be careful: cc points to /usr/bin/gcc!
 
 
 
* 2019-08-14, khabi/offenhaeuser@hlrs.de: How to pin OpenMP-threads in hybrid jobs (naive approach pins 2 threads to _same_ core instead of two different), i.e.: how to do aprun -d?
 
 
 
<br>
 
 
 
== Training ==
 
 
 
There will be internal (i.e. HLRS staff only) trainings on the following topics (tentative):
 
 
 
* HPE Performance MPI
 
** [https://terminplaner4.dfn.de/qTagd9VodYiy89mj survey to set date]
 
** topics available via the above link
 
** target audience: user support staff, internal users of the system
 
 
 
* Processor
 
** [https://terminplaner4.dfn.de/COB6iw5DAFFyDtwe survey to set date]
 
** (tentative) schedule available via the above link
 
** target audience: user support staff, internal users of the system
 
 
 
* Workload Management PBSpro for end users
 
** survey coming soon
 
** target audience: user support staff, internal users of the system
 
 
 
* Cluster and System Administration using HPCM
 
** target audience: admin staff
 
 
 
* Infiniband-Administration and Tuning
 
** target audience: admin staff
 
 
 
* Lustre and Storage Administration
 
** target audience: admin staff
 
 
 
* Workload Management PBSpro Administration and Tuning
 
** target audience: admin staff
 

Latest revision as of 11:35, 27 January 2021