- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

CRAY XE6 Disk Storage: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
 
(9 intermediate revisions by 3 users not shown)
Line 1: Line 1:
=== HOME Directories ===
=== HOME Directories ===


All user HOME directories for every compute node of the cluster are located on the shared RAID system. The compute nodes and login node (frontend) have the HOME directories mounted via NFS. On every node of the cluster the path to your HOME is the same. The filesystem space on HOME is limited by a quota! Due to the limited network performance, the HOME filesystem is not intended for fast I/O and for large files!
Users' HOME directories are located on a shared RAID system and are mounted via NFS on all login (frontend) and compute nodes.
The path to the HOME directories is consistent across all nodes. The filesystem space on HOME is limited by a quota!
 
{{Warning|text=Due to the limited network performance, the HOME filesystem is not intended for large files and fast I/O! Do not read or write files from many nodes (>200) as this will cause trouble for all users. Use single read process + Bcast approach or [[MPI-IO]] instead.}}


=== SCRATCH directories ===
=== SCRATCH directories ===
For large files and fast I/O, please use  
For large files and fast I/O [http://en.wikipedia.org/wiki/Lustre_%28file_system%29 Lustre] based scratch directories are available which make use of the high speed network infrastructure (Gemini). Scratch directories are available on all compute and login (frontend) nodes via the [[workspace mechanism]].
* lustre
{{Note|text=To get the best performance using [[MPI-IO]] it may be necessary to  use tune the file distribution }}
<ul>
{{Warning|text=Worspaces have some restrictions: First there is a '''maximum time limit''' for each workspace (30 days) after which they will be '''deleted automatically'''. Second they have a group quota of 20TB and 1 million files by default. If a project requests less than 20TB and 1 million files, its group quota is set accordingly lower.}}
It's a fast distributed cluster filesystem using the high speed network infrastructure (Gemini). This filesystem is available on all nodes and on the frontend/login nodes.
 
</ul>
The available storage capacity of about 2.5PB has been cut into five areas:
<font color=red>You are responsible to obtain it from the system. To get access to this global scratch filesystems you have to use the </font> [[https://kb.hlrs.de/platforms/index.php/Workspace_mechanism'''workspace mechanism''']].
{| style="background-color:#eeeeee;" cellpadding="5" cellspacing="0" border="1"
Please notice, there is a maximum time limit for each workspace (30 days). After a workspace has exceeded the time limit, the workspace directory will be deleted.
!Name
!usage
!restricted
|-
|Univ_1
|general use
|no
|-
|Univ_2
|only few files per user (~1000)
|no
|-
|Res_1
|reserved for demanding projects
|yes
|-
|Res_2
|reserved for demanding projects
|yes
|-
|Ind_2
|shared acces with Viz cluster @ hlrs
|yes
|}
 
The reason to split the Univ_1/2 filesystems is the time needed to recover from a failure. In
case of a failure which requires a filesystem check , the time needed for this task is related
to the number of files stored on the filesystem. To minimize the downtime after such a failure
we separate users with a few (and probably large files only) to start production of the Cray
system soon, while the group using the filesystem with about 100Mio Files will have to wait a
couple of days (weeks) to become online again. Users / Groups may have access to Univ_1 or
Univ_2 but not both.


=== Filesystem Policy ===
=== Filesystem Policy ===
IMPORTANT! NO BACKUP!! There is NO backup done of any user data located on HWW Cluster systems. The only protection of your data is the redundant disk subsystem. This RAID system is able to handle a failure of one component. There is NO way to recover inadvertently removed data. Users have to backup critical data on their local site!
IMPORTANT! NO BACKUP!! There is NO backup done of any user data located on HWW Cluster systems. The only protection of your data is the redundant disk subsystem. This RAID system is able to handle a failure of one component. There is NO way to recover inadvertently removed data. Users have to backup critical data on their local site!


== Long term storage ==
For data which should be available longer than the workspace time limit allowed and for very important data storage, please use the [http://www.hlrs.de/systems/platforms/hpss-datamanagement High Performance Storage System HPSS]
For data which should be available longer than the workspace time limit allowed and for very important data storage, please use the [http://www.hlrs.de/systems/platforms/hpss-datamanagement High Performance Storage System HPSS]

Latest revision as of 16:01, 11 March 2013

HOME Directories

Users' HOME directories are located on a shared RAID system and are mounted via NFS on all login (frontend) and compute nodes. The path to the HOME directories is consistent across all nodes. The filesystem space on HOME is limited by a quota!

Warning: Due to the limited network performance, the HOME filesystem is not intended for large files and fast I/O! Do not read or write files from many nodes (>200) as this will cause trouble for all users. Use single read process + Bcast approach or MPI-IO instead.


SCRATCH directories

For large files and fast I/O Lustre based scratch directories are available which make use of the high speed network infrastructure (Gemini). Scratch directories are available on all compute and login (frontend) nodes via the workspace mechanism.

Note: To get the best performance using MPI-IO it may be necessary to use tune the file distribution
Warning: Worspaces have some restrictions: First there is a maximum time limit for each workspace (30 days) after which they will be deleted automatically. Second they have a group quota of 20TB and 1 million files by default. If a project requests less than 20TB and 1 million files, its group quota is set accordingly lower.


The available storage capacity of about 2.5PB has been cut into five areas:

Name usage restricted
Univ_1 general use no
Univ_2 only few files per user (~1000) no
Res_1 reserved for demanding projects yes
Res_2 reserved for demanding projects yes
Ind_2 shared acces with Viz cluster @ hlrs yes

The reason to split the Univ_1/2 filesystems is the time needed to recover from a failure. In case of a failure which requires a filesystem check , the time needed for this task is related to the number of files stored on the filesystem. To minimize the downtime after such a failure we separate users with a few (and probably large files only) to start production of the Cray system soon, while the group using the filesystem with about 100Mio Files will have to wait a couple of days (weeks) to become online again. Users / Groups may have access to Univ_1 or Univ_2 but not both.

Filesystem Policy

IMPORTANT! NO BACKUP!! There is NO backup done of any user data located on HWW Cluster systems. The only protection of your data is the redundant disk subsystem. This RAID system is able to handle a failure of one component. There is NO way to recover inadvertently removed data. Users have to backup critical data on their local site!


Long term storage

For data which should be available longer than the workspace time limit allowed and for very important data storage, please use the High Performance Storage System HPSS