- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
CRAY XE6 Disk Storage: Difference between revisions
Line 13: | Line 13: | ||
<font color=red>You are responsible to obtain it from the system. To get access to this global scratch filesystems you have to use the </font> [[https://kb.hlrs.de/platforms/index.php/Workspace_mechanism'''workspace mechanism''']]. | <font color=red>You are responsible to obtain it from the system. To get access to this global scratch filesystems you have to use the </font> [[https://kb.hlrs.de/platforms/index.php/Workspace_mechanism'''workspace mechanism''']]. | ||
Please notice, there is a maximum time limit for each workspace (30 days). After a workspace has exceeded the time limit, the workspace directory will be deleted. | Please notice, there is a maximum time limit for each workspace (30 days). After a workspace has exceeded the time limit, the workspace directory will be deleted. | ||
The available storage capacity of about 2.5PB has been cut into five areas: | |||
{| style="background-color:#eeeeee;" cellpadding="5" cellspacing="0" border="1" | |||
!Name | |||
!usage | |||
!restriced | |||
|- | |||
|Univ_1 | |||
|general use | |||
|no | |||
|- | |||
|Univ_2 | |||
|only few files per user (~1000) | |||
|no | |||
|- | |||
|Res_1 | |||
|reserved for demanding projects | |||
|yes | |||
|- | |||
|Res_2 | |||
|reserved for demanding projects | |||
|yes | |||
|- | |||
|Ind_2 | |||
|shared acces with Viz cluster @ hlrs | |||
|yes | |||
|} | |||
The reason to split the Univ_1/2 filesystems is the time needed to recover from a failure. In | |||
case of a failure which requires a filesystem check , the time needed for this task is related | |||
to the number of files stored on the filesystem. To minimize the downtime after such a failure | |||
we separate users with a few (and probably large files only) to start production of the Cray | |||
system soon, while the group using the filesystem with about 100Mio Files will have to wait a | |||
couple of days (weeks) to become online again. Users / Groups may have access to Univ_1 or | |||
Univ_2 but not both. | |||
=== Filesystem Policy === | === Filesystem Policy === |
Revision as of 12:15, 5 May 2012
HOME Directories
All user HOME directories for every compute node of the cluster are located on a shared RAID system. The compute nodes and login node (frontend) have the HOME directories mounted via NFS. On every node of the cluster the path to your HOME is the same. The filesystem space on HOME is limited by a quota! Due to the limited network performance, the HOME filesystem is not intended for fast I/O and for large files! To read or write even small files from many nodes (> 200) will cause trouble for all users. Applications should designate a single process to do the read and use broadcast mechanism (e.g. MPI_Bcast) to all nodes or use an parallel IO-mechanism like MPI-IO.
SCRATCH directories
For large files and fast I/O, please use
- lustre
-
It's a fast distributed cluster filesystem using the high speed network infrastructure (Gemini). This filesystem is available on all nodes and on the frontend/login nodes.
You are responsible to obtain it from the system. To get access to this global scratch filesystems you have to use the [workspace mechanism]. Please notice, there is a maximum time limit for each workspace (30 days). After a workspace has exceeded the time limit, the workspace directory will be deleted.
The available storage capacity of about 2.5PB has been cut into five areas:
Name | usage | restriced |
---|---|---|
Univ_1 | general use | no |
Univ_2 | only few files per user (~1000) | no |
Res_1 | reserved for demanding projects | yes |
Res_2 | reserved for demanding projects | yes |
Ind_2 | shared acces with Viz cluster @ hlrs | yes |
The reason to split the Univ_1/2 filesystems is the time needed to recover from a failure. In case of a failure which requires a filesystem check , the time needed for this task is related to the number of files stored on the filesystem. To minimize the downtime after such a failure we separate users with a few (and probably large files only) to start production of the Cray system soon, while the group using the filesystem with about 100Mio Files will have to wait a couple of days (weeks) to become online again. Users / Groups may have access to Univ_1 or Univ_2 but not both.
Filesystem Policy
IMPORTANT! NO BACKUP!! There is NO backup done of any user data located on HWW Cluster systems. The only protection of your data is the redundant disk subsystem. This RAID system is able to handle a failure of one component. There is NO way to recover inadvertently removed data. Users have to backup critical data on their local site!
For data which should be available longer than the workspace time limit allowed and for very important data storage, please use the High Performance Storage System HPSS