- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
HPE Hawk News: Difference between revisions
From HLRS Platforms
Jump to navigationJump to search
No edit summary |
No edit summary |
||
Line 10: | Line 10: | ||
* '''2021/03/05:''' Please have in mind that the nodefiles (available via $PBS_NODEFILE) may currently contain nodes in an _unsorted_ manner! In order to increase the performance of communication, it's probably beneficial to sort the node file (with the -V flag) and provide the _sorted_ Version to mpirun! | * '''2021/03/05:''' Please have in mind that the nodefiles (available via $PBS_NODEFILE) may currently contain nodes in an _unsorted_ manner! In order to increase the performance of communication, it's probably beneficial to sort the node file (with the -V flag) and provide the _sorted_ Version to mpirun! | ||
* '''2021 October:''' Additional 24 AI-nodes available. See [[]] | |||
* '''2023 Sommer:''' Additional fast lustre workspace Filesystem (ws11) available. See [[Storage_(Hawk)#SCRATCH_directories_/_workspace_mechanism]] | * '''2023 Sommer:''' Additional fast lustre workspace Filesystem (ws11) available. See [[Storage_(Hawk)#SCRATCH_directories_/_workspace_mechanism]] |
Revision as of 13:29, 4 October 2023
- 2020/11/19: If your job breaks with
MPT ERROR: MPI_COMM_WORLD rank <x> has terminated without calling MPI_Finalize() aborting job MPT: Received signal 9
changes are good that it was due to running out of memory (i.e. your jobs tried to access more memory than available on one of the nodes). We are currently working towards providing you with a more specific error message in this case.
- 2020/11/27: If your job breaks with "IB timeout"-messages, this might be due to crashed nodes, although one would suspect problems with InfiniBand network due to the message. There are still various reasons why nodes might be crashed.
- 2021/03/05: Please have in mind that the nodefiles (available via $PBS_NODEFILE) may currently contain nodes in an _unsorted_ manner! In order to increase the performance of communication, it's probably beneficial to sort the node file (with the -V flag) and provide the _sorted_ Version to mpirun!
- 2021 October: Additional 24 AI-nodes available. See [[]]
- 2023 Sommer: Additional fast lustre workspace Filesystem (ws11) available. See Storage_(Hawk)#SCRATCH_directories_/_workspace_mechanism