|
|
(140 intermediate revisions by 9 users not shown) |
Line 1: |
Line 1: |
| '''Hawk''' is the next generation HPC system at HLRS. It will replace the existing [[Cray XC40|HazelHen]] system.
| |
| The installation is planed to take place in Q4 2019. For more detailed information see the [[Hawk installation schedule]].
| |
|
| |
|
| This Page is under construction!
| | {{Note |
| | | text = Please be sure to read at least the [[10_minutes_before_the_first_job]] document and consult the [[General HWW Documentation]] before you start to work with any of our systems. |
| | }} |
|
| |
|
| | {{Warning |
| | | text = In prepartion of the next generation supercomputer [[ Hunter_(HPE) | Hunter ]], the hardware configuration has been reduced (from 5632 compute nodes to 4096 compute nodes). Workspace filesystem ws10 has been removed. |
| | }} |
|
| |
|
| == Topics for Software Installation Meeting ==
| |
| * Installationsverzeichnis: /opt/hlrs; Platform unabhängig (Ausnahme): /sw/general; '''Links zwischen Platformen /sw/* dürfen nicht mehr verwendet werden'''
| |
| * Dokumentation der Installation: für User: platforms-Wiki; für Installer: staff-Wiki
| |
| * how to inform users about software updates in the future: keine Verbindung von platform ins staff wiki
| |
| MOTD?; Mailingliste schlank halten; muss automatisch sein (siehe module-changes email); wer lädt wann welche module -> relevante emails verschicken
| |
| * hands-on sit (software installation tool)
| |
| * template for modulefiles; see [[#Modulefile best practices]]
| |
| * hierarchy of modulefiles [[#Hierarchy of modules]]
| |
| * default compiler / MPI / others
| |
| * how to deal with software from base linux which should not be visible on cluster, i.e. system compiler (gcc 4.8.5)
| |
| * install location of python bindings
| |
|
| |
|
| == Access ==
| | ---- |
|
| |
|
| Login-Node: hawk-tds-login2.hww.hlrs.de
| |
| {{note|text=Access to the Hawk TDS is limited to support staff at the moment. Please check the [[Hawk installation schedule]] for details about the start of user access.}}
| |
|
| |
|
| == Batch System == | | {| style="border:0; margin: 0;" width="100%" cellspacing="10" |
|
| |
|
| [[Batch_System_PBSPro_(Hawk)|Batch System PBSPro (Hawk)]] | | | valign="top" style="padding: 0; border: 1px solid #aaaaaa; margin-bottom: 0;" | |
| | <div style="font-size: 105%; padding: 0.4em; background-color: #eeeeee; border-bottom: 1px solid #aaaaaa; text-align: center;">'''Introduction'''</div> |
| | <div style="background: #ffffff; padding:0.2em 0.4em;"> |
| | {| style="border: 0; margin: 0;" cellpadding="3" |
| | | valign="top" | |
| | <!-- * [[Hawk_installation_schedule#Terms_of_Use | Terms of use ]] --> |
| | * [[HPE_Hawk_access|Access]] |
| | * [[HPE_Hawk_Hardware_and_Architecture|Hardware and Architecture]] |
| | |} |
| | </div> |
|
| |
|
|
| |
|
| == MPI ==
| |
|
| |
|
| In order to use the MPI implementation provided by HPE, please load the Message Passing Toolkit (MPT) module ''mpt'' (not ABI-compatible to other MPI implementations) or ''hmpt'' (ABI-compatible to MPICH-derivatives).
| | | valign="top" style="padding: 0; border: 1px solid #aaaaaa; margin-bottom: 0;" | |
| For detailed information see the [http://www.hpe.com/support/mpi-ug-036 HPE Message Passing Interface (MPI) User Guide].
| | <div style="font-size: 105%; padding: 0.4em; background-color: #eeeeee; border-bottom: 1px solid #aaaaaa; text-align: center;">'''Troubleshooting'''</div> |
| | <div style="background: #ffffff; padding:0.2em 0.4em;"> |
| | {| style="border: 0; margin: 0;" cellpadding="3" |
| | | valign="top" | |
| | * [[HPE_Hawk_Support|Support (contact/staff)]] |
| | * [[HPE_Hawk_FAQ|FAQ]] |
| | * [http://websrv.hlrs.de/cgi-bin/hwwweather?task=viewmachine&machine=hawk Status,Maintenance for hawk] |
| | * [[HPE_Hawk_News|News]] |
| | |} |
| | </div> |
|
| |
|
| <br>
| |
| == Modulefile best practices ==
| |
|
| |
|
| * Set an environment variable to the root path of your installation (cf. e.g. MPI_ROOT in /usr/share/Modules/modulefiles/hmpt/2.19).
| | |} |
| * Set not only CPATH but also respective variables used by PGI / Intel / etc. -> someone has to figure out the list of those variables, same w.r.t. (LD_)LiBRARY_PATH.
| |
| * Include your Name (finger does not work anymore), E-Mail and date of installation into the modulefile.
| |
| * It's possible to hold the modulefile(s) together with the actual installation in the respective directory and just create symlinks in /opt/hlrs/modulefiles/.
| |
| * Directory structure of /opt/hlrs/ shall be replicated in /opt/hlrs/unsupported-modulefiles/.
| |
| * In case of dependencies, load explicit versions instead of default one!
| |
|
| |
|
| As an example a modulefile for package "foo" version 1.23 (within the category "performance") should look like:
| |
|
| |
| #%Module1.0
| |
| #
| |
| # Change log:
| |
| # Updated 12 Jul 2019, Christoph Niethammer <niethammer@hlrs.de>
| |
| # Installed 08 Aug 2018, Jose Gracia <gracia@hlrs.de>
| |
|
| |
| BASE_DIR=/opt/hlrs/
| |
| CAT=performance
| |
| PACKAGE=Foo
| |
| VERSION=1.23
| |
|
| |
| FOO_ROOT=$BASE_DIR/$CAT/$PACKAGE/$VERSION
| |
|
| |
| setenv FOO_ROOT $FOO_ROOT
| |
| setenv FOO_VERSION $VERSION
| |
|
| |
| prepend-path PATH $FOO_ROOT/bin
| |
| prepend-path LD_LIBRARY_PATH $FOO_ROOT/lib # library search path at time of execution (i.e. in case of _dynamic_ linking)
| |
| prepend-path LIBRARY_PATH $FOO_ROOT/lib # equivalent of "-L" for C, C++, and Fortran
| |
| prepend-path CPATH $FOO_ROOT/include # equivalent to "-I" for C, C++ and Fortran
| |
| prepend-path CPLUS_INCLUDE_PATH $FOO_ROOT/include # equivalent to "-I" only for C++ compiler; usually not needed as CPATH will do
| |
| prepend-path C_INCLUDE_PATH $FOO_ROOT/include # equivalent to "-I" only for C compiler; usually not needed as CPATH will do
| |
| prepend-path MANPATH $FOO_ROOT/share/man # manpages
| |
|
| |
| module-whatis "_brief_ description of what is provided by this module"
| |
|
| |
| proc ModulesHelp {} {
| |
| puts stderr "First line of detailed description\n"
| |
| puts stderr "Second line of detailed description\n"
| |
| puts stderr "Third line of detailed description\n"
| |
| }
| |
|
| |
|
| == Hierarchy of modules ==
| |
| On Vulcan and Hazel Hen we have various module directories like ''tools'', ''utils'', ''misc''. However, it is not very clear what should go here; at the end anything is a tool. I would therefore propose to be more specific.
| |
|
| |
|
| Proposal for module hierarchy (please extend)
| |
|
| |
|
| development/ # development tools
| |
| # svn, git, binutils, cmake
| |
|
| |
| mpi/ # nobody will suspect MPI under mpt/
| |
| # mpt, hmpt, ....
| |
|
| |
| compiler/
| |
| # gcc, oacc, intel, pgi, ...
| |
|
| |
| numlib/ # numerical libraries
| |
| # mkl, trillinos, ...
| |
|
| |
| debugger/ # debugging tools
| |
| # forge, ...
| |
|
| |
| performance/ # performance analysis tools
| |
| # vampir, extrae, scalasca, inspector, advisor, darshan, ...
| |
|
| |
| visualization/ # data visualization tools
| |
| # paraview, ...
| |
|
| |
| python/
| |
| # 3.X; do we need/want 2.7?
| |
|
| |
|
| What to do with libraries which are not necessarily "numlib", e.g.
| | {| style="border:0; margin: 0;" width="100%" cellspacing="10" |
| boost/ -> libraries/boost
| |
| hdf5/ -> libraries/hdf5? io/hdf5?
| |
|
| |
|
| Libraries which are actually some kind of programming model:
| | | valign="top" style="padding: 0; border: 1px solid #aaaaaa; margin-bottom: 0;" | |
| gpi2 -> libraries/gpi2
| | <div style="font-size: 105%; padding: 0.4em; background-color: #eeeeee; border-bottom: 1px solid #aaaaaa; text-align: center;">'''Documentation'''</div> |
| tbb -> libraries/tbb | | <div style="background: #ffffff; padding:0.2em 0.4em;"> |
| | {| style="border: 0; margin: 0;" cellpadding="3" |
| | | valign="top" | |
| | * [[Batch_System_PBSPro_(Hawk)|Batch System]] |
| | * [[Module environment(Hawk)|Module Environment]] |
| | * [[Storage_(Hawk)| Storage Description ]] |
| | * [[Compiler(Hawk)|Compiler]] |
| | * [[MPI(Hawk)|MPI]] |
| | * [[Libraries(Hawk)|Libraries]] |
| | * [[Manuals(Hawk)|Manuals]] |
| | * [[Optimization|Optimization]] |
| | * [[Hawk_PrePostProcessing|Pre- and Post-Processing]] |
| | * [[Big_Data,_AI_Aplications_and_Frameworks|Big Data, AI Applications and Frameworks]] |
| | * [[Performance Analysis Tools]] |
| | * [[CPE|Cray Programming Environment (CPE)]] |
|
| |
|
| What to do with software for projects? Such as
| | |} |
| tools/hidalgo/fenics_hpc/3dairq
| | </div> |
| prace/prace
| |
| Put them in directories which are readable only for a certain group?
| |
|
| |
|
| == Test cases best practices ==
| |
|
| |
|
| Test cases will help to identify and determine the scaling/ performance behavior of the new system.
| |
| Ideally, those test cases can be compared to other systems as well to get a full picture.
| |
|
| |
|
| To do:
| | | valign="top" style="padding: 0; border: 1px solid #aaaaaa; margin-bottom: 0;" | |
| * Definition of a best practice guideline on how to set up a correct test case
| | <div style="font-size: 105%; padding: 0.4em; background-color: #eeeeee; border-bottom: 1px solid #aaaaaa; text-align: center;">'''Utilities'''</div> |
| ** Only measurement of time-stepping loop or equivalent excluding the initialization phase or cleanup
| | <div style="background: #ffffff; padding:0.2em 0.4em;"> |
| ** Well defined measure of computational progress (e.g. LUPS, DoF-UPS, Iterations/s or Flop/s)
| | {| style="border: 0; margin: 0;" cellpadding="3" |
| ** Ideally, the test case is mostly automated with scripts and does also the evaluation on top with a meaningful result file | | | valign="top" | |
| | * [[CAE_utilities|CAE Utilities]] |
| | * [[CAE_howtos|CAE HOWTOs]] |
| | * [[MKL | MKL Fortran Interfaces ]] |
| | |} |
| | </div> |
|
| |
|
| | |} |
|
| |
|
| == TODO ==
| |
|
| |
|
| * 2019-08-22, niethammer@hlrs.de: missing pbs headers (tm.h, ...)
| | ---- |
| * 2019-08-18, dick@hlrs.de: (exuberant) ctags missing on frontend, probably available from RHEL repository
| | [[Help | Help for Wiki Usage]] |
| * 2019-08-18, dick@hlrs.de: manpages are missing on the frontend
| |
| * 2019-08-15, niethammer@hlrs.de: need more explanation on how ''omplace'' works for pinning in the context of SMT (numbering of cores?)
| |
| * 2019-08-15, niethammer@hlrs.de: how to run correctly scripts/wrappers with mpirun? (executes script only once per node, but MPI application if called inside multiple times)
| |
| | |
| * 2019-08-15, niethammer@hlrs.de: missing commands:
| |
| **''resize'' (likely coming with xterm)
| |
| | |
| * 2019-08-15, niethammer@hlrs.de:
| |
| ** MPT = Message Passing Toolkit
| |
| ** MPI from the mpt module uses the SGI ABI, MPI from the hmpt module uses the MPICH ABI
| |
| ** for the MPI compiler wrappers to detect the correct compiler please set MPICC_CC, MPICXX_CXX, MPIF90_F90, MPIF08_F08 to the corresponding compiler commands (2019-08-15, dick@hlrs.de: done)
| |
| ** Should e.g. applications using cae/platform_mpi use perfboost?
| |
| | |
| * 2019-08-07, dick@hlrs: hmpt is ABI-compatible with MPICH-derivativs, but not so mpt
| |
| ** user should know about this!
| |
| ** @HPE: is hmpt a MPICH-derivative, but not so mpt?
| |
| | |
| * 2019-08-07, dick@hlrs: unclear that (h)mpt provides MPI lib -> call it "mpi/hmpt" and "mpi/mpt" instead?
| |
| | |
| * 2019-08-07, dick@hlrs: remove MPI delivered with RHEL
| |
| | |
| * 2019-08-07, dick@hlrs: Intel loads gcc module -> use (LD_)LIBRARY_PATH intead
| |
| | |
| * 2019-08-07, dick@hlrs: be careful: cc points to /usr/bin/gcc!
| |
| | |
| * 2019-08-14, khabi/offenhaeuser@hlrs.de: How to pin OpenMP-threads in hybrid jobs (naive approach pins 2 threads to _same_ core instead of two different), i.e.: how to do aprun -d?
| |
| | |
| <br>
| |
| | |
| == Training ==
| |
| | |
| There will be internal (i.e. HLRS staff only) trainings on the following topics (tentative):
| |
| | |
| * HPE Performance MPI
| |
| ** [https://terminplaner4.dfn.de/qTagd9VodYiy89mj survey to set date]
| |
| ** topics available via the above link
| |
| ** target audience: user support staff, internal users of the system
| |
| | |
| * Processor
| |
| ** [https://terminplaner4.dfn.de/COB6iw5DAFFyDtwe survey to set date]
| |
| ** (tentative) schedule available via the above link
| |
| ** target audience: user support staff, internal users of the system
| |
| | |
| * Workload Management PBSpro for end users
| |
| ** probably one day in week 2019-11-11 tp 2019-11-15
| |
| ** target audience: user support staff, internal users of the system
| |
| | |
| * Cluster and System Administration using HPCM
| |
| ** target audience: admin staff
| |
| | |
| * Infiniband-Administration and Tuning
| |
| ** target audience: admin staff
| |
| | |
| * Lustre and Storage Administration
| |
| ** target audience: admin staff
| |
| | |
| * Workload Management PBSpro Administration and Tuning
| |
| ** target audience: admin staff
| |