- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Storage (Hawk): Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
(Created page with "Users' HOME directories are located on a shared RAID system and are mounted via NFS on all login (frontend) and compute nodes. The path to the HOME directories is consistent a...")
 
 
(23 intermediate revisions by 3 users not shown)
Line 1: Line 1:
=== HOME Directories ===
Users' HOME directories are located on a shared RAID system and are mounted via NFS on all login (frontend) and compute nodes.
Users' HOME directories are located on a shared RAID system and are mounted via NFS on all login (frontend) and compute nodes.
The path to the HOME directories is consistent across all nodes. The filesystem space on HOME is limited by a small quota!
The path to the HOME directories is consistent across all nodes. The filesystem space on HOME is limited by a small quota (50GB per user, 200GB per group)! The quota usage for your account and your groups can be listed using the <tt>na_quota</tt> command on the login nodes.


{{Warning|text=Due to the limited network performance, the HOME filesystem is not intended for use in any compute jobs! Do not read or write files within any compute job, as this will cause trouble for all users. For compute jobs please use the [[workspace mechanism]] which uses the [http://en.wikipedia.org/wiki/Lustre_%28file_system%29 Lustre] based scratch directories. }}
{{Warning|text=Due to the limited network performance, the HOME filesystem is not intended for use in any compute jobs! Do not read or write files within any compute job, as this will cause trouble for all users. For compute jobs please use the [[workspace mechanism]] which uses the [http://en.wikipedia.org/wiki/Lustre_%28file_system%29 Lustre] based scratch directories. }}


=== SCRATCH directories ===
In addition, a '''project''' volume can be made available on request (see [[Project_filesystem]])
For large files and fast I/O [http://en.wikipedia.org/wiki/Lustre_%28file_system%29 Lustre] based scratch directories are available which make use of the high speed network infrastructure. Scratch directories are available on all compute and login (frontend) nodes via the [[workspace mechanism]].
 
{{Note|text=To get the best performance using [[MPI-IO]] it may be necessary to use tune the file distribution }}
=== SCRATCH directories / workspace mechanism ===
{{Warning|text=Worspaces have some restrictions: First there is a '''maximum time limit''' for each workspace (30 days) after which they will be '''deleted automatically'''. Second they have a group capacity quota limits and files count limits by default.}}
For large files and fast I/O [http://en.wikipedia.org/wiki/Lustre_%28file_system%29 Lustre] based scratch directories are available which make use of the high speed network infrastructure. Scratch directories are available on all compute and login (frontend) nodes via the [[workspace mechanism|'''WORKSPACE MECHANISM''']].
The available storage capacity is about 25 PB
{{Note|text=To get the best performance using [[MPI-IO]] it may be necessary to tune the file distribution or MPI I/O info arguments.}}
{{Warning|text=Worspaces have some restrictions: First there is a '''maximum time limit''' for each workspace (60 days on ws10, 10 days on ws11) after which they will be '''deleted automatically'''. Second they have a group capacity quota limit and files count limit by default on ws10.}}
 
On hawk there are 2 different workspace filesystems available, each with different properties:
* ws10:
** Basepath: /lustre/hpe/ws10
** availabe storage capacity: 22 PB
** project quota limits: <font color=red>enabled for blocks and files</font>
** max. workspace duration: 60 days
** max. workspace extensions: 3
** lustre devices: 2 MDS, 4 MDT, 8 OSS, 48 OST
** performance: < 100 GiB/s
* ws11:
** Basepath: /lustre/hpe/ws11
** available storage capacity: 15 PB
** project quota limits: disabled
** max. workspace duration: 10 days
** max. workspace extensions: 1
** lustre devices: 2 MDS, 2 MDT, 20 OSS, 40 OST
** performance: ~200 GiB/s
 
 
Each user has access to both filesystems via the [[workspace mechanism|'''WORKSPACE MECHANISM''']]
 
 
=== localscratch ===
Some special node types (AI nodes, Pre-Post nodes, login nodes, see also [[Batch_System_PBSPro_(Hawk)#Node_types | node_types ]]) have a local disk installed and mounted on /localscratch. On this nodes:
* each batchjob creates a /localscratch/$PBS_JOBID directory owned by the job owner.
* each ssh login session creates a /localscratch/$UID directory owned by $UID.
* at the end of a user batch job, the directory $PBS_JOBID in /localscratch on the node will be removed!
* at the end of all login sessions of a user on a node, the $UID directorie in /localscratch will be removed!
* the individual /localscratch filesystem on the nodes is not shared with other nodes.


=== Filesystem Policy ===
=== Filesystem Policy ===
IMPORTANT! NO BACKUP!! There is NO backup done of any user data located on HWW Cluster systems. The only protection of your data is the redundant disk subsystem. This RAID system is able to handle a failure of one component. There is NO way to recover inadvertently removed data. Users have to backup critical data on their local site!
IMPORTANT! NO BACKUP!! There is NO backup done of any user data located on HWW Cluster systems. The only protection of your data is the redundant disk subsystem. This RAID system is able to handle a failure of one component. There is NO way to recover inadvertently removed data. Users have to backup critical data on their local site!  
 
Please see also [[Storage_usage_policy#Usage_guidelines_for_workspace_filesystems]]


== Long term storage ==
== Long term storage ==
For data which should be available longer than the workspace time limit allowed and for very important data storage, please use the [https://www.hlrs.de/systems/hpss-data-management/ High Performance Storage System HPSS]
For data which should be available longer than the workspace time limit allowed and for very important data storage, please use the [[High Performance Storage System (HPSS)]]

Latest revision as of 13:05, 20 December 2023

HOME Directories

Users' HOME directories are located on a shared RAID system and are mounted via NFS on all login (frontend) and compute nodes. The path to the HOME directories is consistent across all nodes. The filesystem space on HOME is limited by a small quota (50GB per user, 200GB per group)! The quota usage for your account and your groups can be listed using the na_quota command on the login nodes.

Warning: Due to the limited network performance, the HOME filesystem is not intended for use in any compute jobs! Do not read or write files within any compute job, as this will cause trouble for all users. For compute jobs please use the workspace mechanism which uses the Lustre based scratch directories.


In addition, a project volume can be made available on request (see Project_filesystem)

SCRATCH directories / workspace mechanism

For large files and fast I/O Lustre based scratch directories are available which make use of the high speed network infrastructure. Scratch directories are available on all compute and login (frontend) nodes via the WORKSPACE MECHANISM.

Note: To get the best performance using MPI-IO it may be necessary to tune the file distribution or MPI I/O info arguments.
Warning: Worspaces have some restrictions: First there is a maximum time limit for each workspace (60 days on ws10, 10 days on ws11) after which they will be deleted automatically. Second they have a group capacity quota limit and files count limit by default on ws10.


On hawk there are 2 different workspace filesystems available, each with different properties:

  • ws10:
    • Basepath: /lustre/hpe/ws10
    • availabe storage capacity: 22 PB
    • project quota limits: enabled for blocks and files
    • max. workspace duration: 60 days
    • max. workspace extensions: 3
    • lustre devices: 2 MDS, 4 MDT, 8 OSS, 48 OST
    • performance: < 100 GiB/s
  • ws11:
    • Basepath: /lustre/hpe/ws11
    • available storage capacity: 15 PB
    • project quota limits: disabled
    • max. workspace duration: 10 days
    • max. workspace extensions: 1
    • lustre devices: 2 MDS, 2 MDT, 20 OSS, 40 OST
    • performance: ~200 GiB/s


Each user has access to both filesystems via the WORKSPACE MECHANISM


localscratch

Some special node types (AI nodes, Pre-Post nodes, login nodes, see also node_types ) have a local disk installed and mounted on /localscratch. On this nodes:

  • each batchjob creates a /localscratch/$PBS_JOBID directory owned by the job owner.
  • each ssh login session creates a /localscratch/$UID directory owned by $UID.
  • at the end of a user batch job, the directory $PBS_JOBID in /localscratch on the node will be removed!
  • at the end of all login sessions of a user on a node, the $UID directorie in /localscratch will be removed!
  • the individual /localscratch filesystem on the nodes is not shared with other nodes.

Filesystem Policy

IMPORTANT! NO BACKUP!! There is NO backup done of any user data located on HWW Cluster systems. The only protection of your data is the redundant disk subsystem. This RAID system is able to handle a failure of one component. There is NO way to recover inadvertently removed data. Users have to backup critical data on their local site! Please see also Storage_usage_policy#Usage_guidelines_for_workspace_filesystems

Long term storage

For data which should be available longer than the workspace time limit allowed and for very important data storage, please use the High Performance Storage System (HPSS)