- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

CRAY XE6 Disk Storage

From HLRS Platforms
Revision as of 16:01, 11 March 2013 by Hpcbsche (talk | contribs) (→‎SCRATCH directories)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

HOME Directories

Users' HOME directories are located on a shared RAID system and are mounted via NFS on all login (frontend) and compute nodes. The path to the HOME directories is consistent across all nodes. The filesystem space on HOME is limited by a quota!

Warning: Due to the limited network performance, the HOME filesystem is not intended for large files and fast I/O! Do not read or write files from many nodes (>200) as this will cause trouble for all users. Use single read process + Bcast approach or MPI-IO instead.


SCRATCH directories

For large files and fast I/O Lustre based scratch directories are available which make use of the high speed network infrastructure (Gemini). Scratch directories are available on all compute and login (frontend) nodes via the workspace mechanism.

Note: To get the best performance using MPI-IO it may be necessary to use tune the file distribution
Warning: Worspaces have some restrictions: First there is a maximum time limit for each workspace (30 days) after which they will be deleted automatically. Second they have a group quota of 20TB and 1 million files by default. If a project requests less than 20TB and 1 million files, its group quota is set accordingly lower.


The available storage capacity of about 2.5PB has been cut into five areas:

Name usage restricted
Univ_1 general use no
Univ_2 only few files per user (~1000) no
Res_1 reserved for demanding projects yes
Res_2 reserved for demanding projects yes
Ind_2 shared acces with Viz cluster @ hlrs yes

The reason to split the Univ_1/2 filesystems is the time needed to recover from a failure. In case of a failure which requires a filesystem check , the time needed for this task is related to the number of files stored on the filesystem. To minimize the downtime after such a failure we separate users with a few (and probably large files only) to start production of the Cray system soon, while the group using the filesystem with about 100Mio Files will have to wait a couple of days (weeks) to become online again. Users / Groups may have access to Univ_1 or Univ_2 but not both.

Filesystem Policy

IMPORTANT! NO BACKUP!! There is NO backup done of any user data located on HWW Cluster systems. The only protection of your data is the redundant disk subsystem. This RAID system is able to handle a failure of one component. There is NO way to recover inadvertently removed data. Users have to backup critical data on their local site!


Long term storage

For data which should be available longer than the workspace time limit allowed and for very important data storage, please use the High Performance Storage System HPSS