- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

CRAY XE6 Disk Storage

From HLRS Platforms
Jump to navigationJump to search

HOME Directories

All user HOME directories for every compute node of the cluster are located on a shared RAID system. The compute nodes and login node (frontend) have the HOME directories mounted via NFS. On every node of the cluster the path to your HOME is the same. The filesystem space on HOME is limited by a quota! Due to the limited network performance, the HOME filesystem is not intended for fast I/O and for large files! To read or write even small files from many nodes (> 200) will cause trouble for all users. Applications should designate a single process to do the read and use broadcast mechanism (e.g. MPI_Bcast) to all nodes or use an parallel IO-mechanism like MPI-IO.

SCRATCH directories

For large files and fast I/O, please use

  • lustre
    It's a fast distributed cluster filesystem using the high speed network infrastructure (Gemini). This filesystem is available on all nodes and on the frontend/login nodes.

You are responsible to obtain it from the system. To get access to this global scratch filesystems you have to use the [workspace mechanism]. Please notice, there is a maximum time limit for each workspace (30 days). After a workspace has exceeded the time limit, the workspace directory will be deleted.

The available storage capacity of about 2.5PB has been cut into five areas:

Name usage restricted
Univ_1 general use no
Univ_2 only few files per user (~1000) no
Res_1 reserved for demanding projects yes
Res_2 reserved for demanding projects yes
Ind_2 shared acces with Viz cluster @ hlrs yes

The reason to split the Univ_1/2 filesystems is the time needed to recover from a failure. In case of a failure which requires a filesystem check , the time needed for this task is related to the number of files stored on the filesystem. To minimize the downtime after such a failure we separate users with a few (and probably large files only) to start production of the Cray system soon, while the group using the filesystem with about 100Mio Files will have to wait a couple of days (weeks) to become online again. Users / Groups may have access to Univ_1 or Univ_2 but not both.

Filesystem Policy

IMPORTANT! NO BACKUP!! There is NO backup done of any user data located on HWW Cluster systems. The only protection of your data is the redundant disk subsystem. This RAID system is able to handle a failure of one component. There is NO way to recover inadvertently removed data. Users have to backup critical data on their local site!

For data which should be available longer than the workspace time limit allowed and for very important data storage, please use the High Performance Storage System HPSS